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and nuclei of cells in vitro and in vivo by the use of novel transport 
polypeptides which comprise one or more portions of HIV tat protein 
and which are covalently attached to cargo molecules. The transport 
polypeptides of this invention are characterized by the presence of the 
tat basic region (amino acids 49-57), the absence of the tat cysteine- 
rich region (amino acids 22-36) and the absence of the tat exon 2-en- 
coded carboxy-terminal domain (amino acids 73-86) of the naturally- 
occurring tat protein. The absence of the cysteine-rich region found in 
conventional tat proteins solves the problems of spurious trans-activa- 
tion and disulfide aggregation. 
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TAT-DERIVED TRANSPORT POLYPEPTTDF.q 

This application is a continuation-in-part of 
copending application Serial No. 07/934,375, filed 
5 August 21, 1992. 

TECHNICAL FIELD OF THE INVENTION 

This invention relates to delivery of 
biologically active cargo molecules, such as 
polypeptides and nucleic acids, into the cytoplasm and 

10 nuclei of cells in vitro and in vivo . Intracellular 

delivery of cargo molecules according to this invention 
is accomplished by the use of novel transport 
polypeptides which comprise one or more portions of HIV 
tat protein and which are covalently attached to cargo 

15 molecules. The transport polypeptides of this 

invention are characterized by the presence of the tat 
basic region (amino acids 49-57) , the absence of the 
tat cysteine-rich region (amino acids 22-36) and the 
absence of the tat exon 2-encoded carboxy-terminal 

20 domain (amino acids 73-86) of the naturally-occurring 

tat protein. By virtue of the absence of the cysteine- 
rich region found in conventional tat proteins, the 
transport polypeptides of rhis 'invention solve the 
problems of spurious trans-activation and disulfide 

25 aggregation. The reduced size of the transport 
polypeptides of this invention also minimizes 
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interference with the biological activity of the cargo 
molecule. 

BACKGROUND OF THE INVENTION 

Biological cells are generally impermeable to 
5 macromolecules , including proteins and nucleic acids . 
Some small molecules enter living cells at very low 
rates. The lack of means for delivering macromolecules 
into cells in vivo has been an obstacle to the 
therapeutic, prophylactic and diagnostic use of a 

10 potentially large number of proteins and nucleic acids 
having intracellular sites of action. Accordingly, 
most therapeutic, prophylactic and diagnostic 
candidates produced to date using recombinant DNA 
technology are polypeptides that act in the 

15 extracellular environment or on the target cell 
surface. 

Various methods have been developed for 
delivering macromolecules into cells in vitro . A list 
of such methods includes electroporation , membrane 

20 fusion with liposomes, high velocity bombardment with 
DNA-coated micropro j ectiles , incubation with calcium- 
phosphate-DNA precipitate, DEAE-dextran mediated 
transf ection, infection with modified viral nucleic 
acids, and direct micro-injection into single cells. 

25 These in vitro methods typically deliver the nucleic 
acid molecules into only a fraction of the total cell 
population, and they tend to damage large numbers of 
ceils. Experimental delivery of macromolecules into 
cells in vivo has been accomplished with scrape 

30 loading, calcium phosphate precipitates and liposomes J 
However, these techniques have, ro date, shown limited 
usefulness for iji vivo cellular delivery. Moreover, 
even with cells in vitro , such methods are of extremely 
limited usefulness for delivery of proteins. 
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10 



General methods for efficient delivery of 
biologically active proteins into intact cells, 
in vitro and in vivo, are needed. (L.A. Sternson, 
"Obstacles to Polypeptide Delivery", Ann. N.v. An.^ri . 
Sci, 57, pp. 19-21 (1987)). Chemical addition of a 
lipopeptide (P. Hoffmann et al. , "Stimulation of Human 
and Murine Adherent Cells by Bacterial Lipoprotein and 
Synthetic Lipopeptide Analogues", Immunobiol . . 177, 
pp. 158-70 (1988)) or a basic polymer such as 
polylysine or polyarginine (W-c. Chen et al., 
"Conjugation of Poly-L-Lysine Albumin and Horseradish 
Peroxidase: A Novel Method of Enhancing the Cellular 
Uptake of Proteins", Proc. Natl. Acad. Sci. u.^a 75, 
pp. 1872-76 (1978)) have not proved to be highly 
15 reliable or generally useful (see Example 4 infra,). 
Folic acid has been used as a transport moiety (CP. 
Leamon and Low, Delivery of Macromolecules into Living 
Cells: A Method That Exploits Folate Receptor 
Endocytosis" , Proc. Natl. Acad. Sci u.qA , 88, pp. 5572- 
20 76 (1991)). Evidence was presented for internalization 
of folate conjugates, but not for cytoplasmic delivery. 
Given the high levels of circulating folate in vivo ., 
the usefulness of this system has not been fully 
demonstrated. Pseudomonas exotoxin has also been used 
25 as a transport moiety (T.I. Prior et al., "Barnase 
Toxin: A New Chimeric Toxin Composed of Pseudomonas 
Exotoxin A and Barnase", Cell . 64, pp. 1017-23 (I99i)). 
The efficiency and general applicability of this system 
is not clear from the published work, however. 

The tat protein of human immunodeficiency 
virus type-l ("HIV") has demonstrated potential for 
delivery of cargo proteins into cells (published PCT 
application WO 91/09958). However, given the chemical 
properties of the full-length tat protein, generally 



30 
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applicable methods for its efficient use in delivery of 
biologically active cargo are not taught in the art. 

Tat is an HIV-encoded protein that trans- 
activates certain HIV genes and is essential for viral 
5 replication. The full-length HIV-1 tat protein has 86 
amino acid residues. The HIV tat gene has two exons. 
Tat amino acids 1-7 2 are encoded by exon 1, and amino 
acids 73-86 are encoded by exon 2. The full-length tat 
protein is characterized by a basic region which 

10 contains two lysines and six arginines (amino acids 

49-57) and a cysteine-r ich region which contains seven 
cysteine residues (amino acids 22-37) . Purified tat 
protein is taken up from the surrounding medium by 
human cells growing in culture (A.D. Frankel and CO. 

15 ^ Pabo, "Cellular Uptake of the Tat Protein from Human 

Immunodeficiency Virus", Cell , 55, pp. 1189-93 (1988)). 
The art does not teach whether the cysteine-rich region 
of tat protein (which causes aggregation and 
insolubility) is required for cellular uptake of tat 

20 protein. 

PCT patent application WO 91/09958 ("the '958 
application") discloses that a heterologous protein 
consisting of amino acids 1-67 of HIV tat protein 
genetically fused to a papillomavirus E2 trans- 

25 activation repressor polypeptide is taken up by 

cultured cells. However, preservation of the cargo 
polypeptide's biological activity (repression of E2 
trans-activation) is not demonstrated therein. 

The use of tat protein, as taught in the '958 
.30 application, potentially involves practical 

difficulties when used for cellular delivery of cargo 
proteins. Those practical difficulties include protein 
aggregation and insolubility involving the cysteine- 
rich region of tat protein. Furthermore, the '958 

35 application provides no examples of chemical cross- 
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linking of tat protein to cargo proteins, which may be 
critical in situations where genetic fusion of tat to 
the cargo protein interferes with proper folding of the 
tat protein, the cargo protein, or both. In addition, 
5 both the '958 application and Frankel and Pabo ( supra ) 
teach the use of tat transport proteins in conjunction 
with chloroquine, which is cytotoxic. The need exists, 
therefore, for generally applicable means for safe, 
efficient delivery of biologically active cargo 
10 molecules into the cytoplasm and nuclei of living 
cells. 

SUMMARY OF THE INVENTION 

This invention solves the problems set forth 
above by providing processes and products for the 

15 efficient cytoplasmic and nuclear delivery of 

biologically active non-tat proteins, nucleic acids and 
other molecules that are (l) not inherently capable of 
entering target cells or cell nuclei, or (2) not 
inherently capable of entering target cells at a useful 

20 rate. Intracellular delivery of cargo molecules 

according to this invention is accomplished by the use 
of novel transport proteins which comprise one or more 
portions of HIV tat protein and which are covalently 
attached to the cargo molecules. More particularly, 

25 this invention relates to novel transport polypeptides, 
methods for making those transport polypeptides, 
transport polypeptide-cargo conjugates, pharmaceutical, 
prophylactic and diagnostic compositions comprising 
transport polypeptide-cargo conjugates and methods for 

30 delivery of cargo into cells by means of tat-related 
transport polypeptides. 

The transport polypeptides of this invention 
are characterized by the presence of the tat basic 
region amino acid sequence (amino acids 49-57 of 
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naturally-occurring tat protein) ; the absence of the 
tat cysteine-rich region amino acid sequence (amino 
acids 22-36 of naturally-occurring tat protein) and the 
absence of the tat exon 2-encoded carboxy-terminal 
5 domain (amino acids 73-86 of naturally-occurring tat 
protein) . Preferred embodiments of such transport 
polypeptides are: tat37-72 (SEQ ID N0:2), tat37-58 
(SEQ ID NO:3), tat38-58GGC (SEQ ID NO:4), tatCGG47-58 
(SEQ ID N0:5) tat47-58GGC (SEQ ID N0:6), and tatAcys 

10 (SEQ ID NO: 7), It will be recognized by those of 
ordinary skill in the art that when the transport 
polypeptide is genetically fused to the cargo moiety, 
an amino-terminal methionine must be added, but the 
spacer amino acids (e.g., CysGlyGly or GlyGlyCys) need 

15 not be added. By virtue of the absence of the 

cysteine-rich region present in conventional tat 
proteins, transport polypeptides of this invention 
solve the problem of disulfide aggregation, which can 
result in loss of the cargo's biological activity, 

20 insolubility of the transport polypept ide-cargo 

conjugate, or both- The reduced size of the transport 
polypeptides of this invention also advantageously 
minimizes interference with the biological activity of 
the cargo. A further advantage of the reduced 

2 5 transport polypeptide size is enhanced uptake 

efficiency in embodiments of this invention involving 
attachment of multiple transport polypeptides per cargo 
molecule , 

Transport polypeptides of this invention may 

3 0 be advantageously attached to cargo molecules by 

chemical cross-linking or by genetic fusion. According 
to preferred embodiments of this invention, the 
transport polypeptide and the cargo molecule are 
chemically ' cross-linked - A unique terminal cysteine 
35 residue is a preferred means of chemical cross- 
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linking. According to other preferred embodiments of 
this invention, the carboxy terminus of the transport 
moiety is genetically fused to the amino terminus of 
the cargo moiety. A particularly preferred embodiment 
5 of the present invention is JB106, which consists of an 
amino-terminal methionine followed by tat residues 47- 
58, followed by HPV-16 E2 residues 245-365. 

In many cases, the novel transport 
polypeptides of this invention advantageously avoid 

10 chloroquine-associated toxicity. According to one 

preferred embodiment of this invention, a biologically 
active cargo is delivered into the cells of various 
organs and tissues following introduction of a 
transport polypeptide-cargo conjugate into a live human 

15 or animal. By virtue of the foregoing features, this 
invention opens the way for biological research and 
disease therapy involving proteins, nucleic acids and 
other molecules with cytoplasmic or nuclear sites of 
action. 

2 0 BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 depicts the amino acid sequence of 
HIV-1 tat protein (SEQ ID N0:1). 

Figure 2 summarizes the results of cellular 
uptake experiments with transport polypeptide- 
25 Pseudomonas exotoxin ribosylation domain conjugates 
(shaded bars, unconjugated; diagonally-hatched bars, 
conjugated) . 

Figure 3 summarizes the results of cellular 
uptake experiments with transport polypeptide- 
30 ribonuclease conjugates (closed squares, r ibonuclease- 
SMCC without transport moiety; closed circles, tat37- 
72-ribonuclease; closed triangles tat38-58GGC- 
ribonuclease; closed diamonds, tatCGG38-58- 
ribonuclease; open squares, tatCGG4 7-58-r ibonuclease) . 
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Figure 4 schematically depicts the 
construction of plasmid pAHE2 • 

Figure 5 schematically depicts the 
construction of plasmid pET8cl23. 
5 Figure 6 schematically depicts the 

construction of plasmid pET8cl23CCSS . 

Figure 7 summarizes the results of cellular 
uptake experiments with transport polypeptide-E2 
repressor conjugates (open diamonds, E2.123 cross-. 
10 linked to tat37-72, without chloroquine; closed 
diamonds, E2.123 cross-linked to tat37-72, with 
chloroquine; open circles, E2.123CCSS cross-linked to 
tat37-72, without chloroquine; closed circles, 
E2.123CCSS cross-linked to tat37-72, with chloroquine). 
15 Figure 8 schematically depicts the 

construction of plasmid pTATAcys. 

Figure 9 schematically depicts the 
construction of plasmid pFTESOl, 

Figure 10 schematically depicts the 
20 construction of plasmid pTATAcys-249 . 

Figure 11 schematically depicts the 
construction of plasmid pJB106, 

Figure 12 depicts the complete amino acid 
sequence of protein JB106. 
25 Figure 13 summarizes the results of E2 

repression assays involving JB106 (squares) , TxHE2CCSS 
(diamonds) and HE2.123 (circles). The assays were 
carried out in C0S7 cells, without chloroquine, as 
described in Example 14. 
3 0 ■ DETAILED DESCRIPTION OF THE INVENTION 

In order that the invention herein described 
may be more fully understood, the following detailed 
description is set forth. 

In the description, the following terms are 

3 5 employed: 



wo 94/04686 



PCT/US93/07833 



Amino acid — A monomeric unit of a peptide, 
polypeptide or protein. The twenty protein amino acids 
(L-isomers) are: alanine ("Ala" or "A"), arginine 
("Arg" or "R") , asparagine ("Asn" or "N") , aspartic 
5 acid ("Asp" or "D"), cysteine ("Cys" or "C") , glutamine 
("Gin" or "Q"), glutamic acid ("Glu" or "E") , glycine 
("Gly" or "G"), histidine ("His" or "H") , isoleucine 
("He" or "I"), leucine ("Leu" or "L") , lysine ("Lys" 
or "K"), methionine ("Met" or "M"), phenylalanine 
10 ("Phe" or "F"), proline ("Pro" or "P") , serine ("Ser" 

or "S"), threonine ("Thr" or "T"), tryptophan ("Trp" or 
"W"), tyrosine ("Tyr" or "Y") and valine ("Val" or 
"V") . The term amino acid, as used herein, also 
includes analogs of the protein amino acids, and 
15 D-isomers of the protein amino acids and their-^ analogs. 
Cargo — A molecule that is not a tat protein 
or a fragment thereof, and that is either ("l) not 
inherently capable of entering target cells, or (2) not 
inherently capable of entering target cells at a useful 
20 rate, ("Cargo", as used in this application, refers 

either to a molecule, per se, i.e., before conjugation, 
"t,o the cargo moiety of a transport polypeptide-cargo 
conjugate.) Examples of "cargo" include, but are not 
limited to, small molecules and macromolecules , such as 
25 polypeptides, nucleic acids and polysaccharides. 

Chemical cross-linking — Covalent bonding of 
two or more pre-formed molecules. 

Cargo conjugate — A molecule comprising at 
least one transport polypeptide moiety and at least one 
3 0 cargo moiety, formed either through genetic fusion or 

chemical cross-linking of a transport polypeptide and a 
cargo molecule. 

Genetic fusion — Co-linear, covalent linkage 
of two or more proteins via their polypeptide 
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backbones, through genetic expression of contiguous DNA 
sequences encoding the proteins. 

Macromolecule — A molecule, such as a 
peptide, polypeptide, protein or nucleic acid. 
5 Polypeptide — Any polymer consisting 

essentially of any of the 20 protein amino acids 
(above) , regardless of its size. Although "protein" is 
often used in reference to relatively large 
polypeptides, and "peptide" is often used in reference 

10 to small polypeptides, usage of these terms in the art 
overlaps and varies. The term "polypeptide" as used 
herein refers to peptides, polypeptides and proteins, 
unless otherwise noted. 

Reporter gene — A gene the expression of 

15 which depends on the occurrence of a cellular event of 
interest, and the expression of which can be 
conveniently observed in a genetically transformed host 
cell. 

Reporter plasmid — A plasmid vector 
20 comprising one or more reporter genes. 

Small molecule — A molecule other than a 
macromolecule . 

Spacer amino acid — An amino acid 
(preferably having a small side chain) included between 
25 a transport moiety and an amino acid residue used for 
chemical cross-linking (e.g., to provide molecular 
flexibility and avoid steric hindrance). 

Target cell — A cell into which a cargo is 
delivered by a transport polypeptide. A "target cell" 
30 may be any cell, including human cells, either in vivo 
or in vitro . 

Transport moiety or transport polypeptide — 
A polypeptide capable of delivering a covalently 
attached cargo into a target cell. 
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This invention is generally applicable for 
therapeutic, prophylactic or diagnostic intracellular 
delivery of small molecules and macromolecules , such as 
proteins, nucleic acids and polysaccharides, that are 
5 not inherently capable of entering target cells at a 
useful rate. It should be appreciated, however, that 
alternate embodiments of this invention are not limited 
to clinical applications. This invention may be 
advantageously applied in medical and biological 

10 research. In research applications of this invention, 
the cargo may be a drug or a reporter molecule. 
Transport polypeptides of this invention may be used as 
research laboratory reagents, either alone or as part 
of a transport polypeptide conjugation kit. 

15 The target cells may be in vivo cells, i.e., 

cells composing the organs or tissues of living animals 
or humans, or microorganisms found in living animals or 
humans. The target cells may also be in vitro cells, 
i.e., cultured animal cells, human cells or 

20 microorganisms . 

Wide latitude exists in the selection of 
drugs and reporter molecules for use in the practice of 
this invention. Factors to be considered in selecting 
reporter molecules include, but are not limited to, the 

25 type of experimental information sought, non-toxicity , 
convenience of detection, quantif lability of detection, 
and availability. Many such reporter molecules are 
known to those skilled in the art. 

As will be appreciated from the examples 

30 presented below, we have used enzymes for which 
colorimetric assays -^xist, as model cargo to 
demonstrate the operabiliry and useful features of the 
transport polypeptides of this invention. These enzyme 
cargos provide for sensitive, convenient, visual 

35 detection of cellular uptake. Furthermore, since 
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visual readout occurs only if the enzymatic activity of 
the cargo is preserved, these enzymes provide a 
sensitive and reliable test for preservation of 
biological activity of the cargo moiety in transport 
5 polypeptide-cargo conjugates according to this 

invention. A preferred embodiment of this invention 
comprises horseradish peroxidase ("HRP") as the cargo 
moiety of the transport polypeptide-cargo conjugate. A 
particularly preferred model cargo moiety for practice 

10 of this invention is 5-galactosidase . 

Model cargo proteins may also be selected 
according to their site of action within the cell. As 
described in Examples 6 and 7, below, we have used the 
ADP ribosylation domain from Pseudomonas exotoxin 

15 ("PE") and pancreatic ribonuclease to confirm 

cytoplasmic delivery of a properly folded cargo 
proteins by transport polypeptides according to this 
invention. 

Full-length Pseudomonas exotoxin is itself 

20 capable of entering cells, where it inactivates 

ribosomes by means of an ADP ribosylation reaction, 
thus killing the cells. A portion of the Pseudomonas 
exotoxin protein known as the ADP ribosylation domain 
is incapable of entering cells, but it retains the 

25 ability to inactivate ribosomes if brought into contact 
with them. Thus, cell death induced by transport 
polypeptide-PE ADP ribosylation domain conjugates is a 
test for cytoplasmic delivery of the cargo by the 
transport polypeptide . 

30 We have also used ribonuclease to confirm 

cytoplasmic delivery of a properly folded cargo protein 
by transport polypeptides of this invention. Protein 
synthesis, an RNA-dependent process, is highly 
sensitive to ribonuclease, which digests RNA. 

35 Ribonuclease is, by itself, incapable of entering 
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cells, however. Thus, inhibition of protein synthesis 
by a transport polypeptide-ribonuclease conjugate is a 
test for intracellular delivery of biologically active 
ribonuclease. 

5 Of course, delivery of a given cargo molecule 

to the cytoplasm may be followed by further delivery of 
the same cargo molecule to the nucleus. Nuclear 
delivery necessarily involves traversing some portion 
of the cytoplasm. 

Papillomavirus E2 repressor proteins are 
examples of macromolecular drugs that may be delivered 
into the nuclei of target cells by the transport 
polypeptides of this invention. Papillomavirus E2 
protein, which normally exists as a homodimer, 

15 regulates both transcription and replication of the 

papillomavirus genome. The carboxy-terminal domain of 
the E2 protein contains DNA binding and dimerization 
activities. Transient expression of DNA sequences 
encoding various E2 analogs or E2 carboxy-terminal 

20 fragments in transfected mammalian cells inhibits 
trans-activation by the full-length E2 protein 
(J. Barsoum et al., "Mechanism of Action of the 
Papillomavirus E2 Repressor: Repression in the Absence 
of DNA Binding", J. Virol ■ . 66, pp. 3941-3945 (1992)). 

25 E2 repressors added to the growth medium of cultured 

mammalian cells do not enter the cells, and thus do not 
inhibit E2 trans-activation in those cells. However, 
conjugation of the transport polypeptides of this 
invention to E2 repressors results in translocation of 

3 0 the E2 repressors from the growth medium into the 

cultured cells, where they display biological activity, 
repressing E2-dependent expression of a reporter gene. 

The rate at which single-stranded and double- 
stranded nucleic acids enter cells, in vitro and in 

35 vivo, may be advantageously enhanced, using the 
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transport polypeptides of this invention. As shown in 
Example 11 (below) , methods for chemical cross-linking 
of polypeptides to nucleic acids are well known in the 
art. In a preferred embodiment of this invention, the 
5 cargo is a single-stranded antisense nucleic acid. 
Antisense nucleic acids are useful for inhibiting 
cellular expression of sequences to which they are 
complementary. In another embodiment of this 
invention, the cargo is a double-stranded nucleic acid 

10 comprising a binding site recognized by a nucleic acid- 
binding protein. An example of such a nucleic acid- 
binding protein is a viral trans-activator. 

Naturally-occurring HIV-l tat protein 
(Figure 1) has a region (amino acids 22-37) wherein 7 

15 out of 16 amino acids are cysteine. Those cysteine 
residues are capable of forming disulfide bonds with 
each other, with cysteine residues in the cysteine- 
rich region of other tat protein molecules and with 
cysteine residues in a cargo protein or the cargo 

20 moiety of a conjugate. Such disulfide bond formation 
can cause loss of the cargo's biological activity. 
Furthermore, even if there is no potential for 
disulfide bonding to the cargo moiety (for example, 
when the cargo protein has no cysteine residues) , 

2 5 disulfide bond formation between transport polypeptides 

leads to aggregation and insolubility of the transport 
polypeptide, the transport polypeptide-cargo conjugate, 
or both. The tat cysteine-rich region is potentially a 
source of serious problems in the use of naturally- 
30 occurring tat protein for cellular delivery of cargo 
molecules . 

The cysteine-rich region is required for 
dimerization of tat in vitro , and is required for 
trans-activation of HIV DNA sequences. Therefore, 

3 5 removal of the tat cysteine-rich region has the 
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additional advantage of eliminating the natural 
activity of tat, i.e., induction of HIV transcription 
and replication. However, the art does not teach 
whether the cysteine-rich region of the tat protein is 
5 required for cellular uptake. 

The present invention includes embodiments 
wherein the problems associated with the tat cysteine- 
rich region are solved, because that region is not 
present in the transport polypeptides described herein. 

10 In those embodiments, cellular uptake of the transport 
polypeptide or transport polypeptide-cargo molecule 
conjugate still occurs. In one group of preferred 
embodiments of this invention, the sequence of amino 
acids preceding the cysteine-rich region is fused 

15 directly to the sequence of amino acids following the 
cysteine-rich region. Such transport polypeptides are 
called tatAcys, and have the general formula (tatl- 
21) - (tat38-n) , where n is the number of the carboxy- 
terminal residue, i.e., 49-86. Preferably, n is 58-72. 

20 As will be appreciated from the examples below, the 

amino acid sequence preceding the cysteine-rich region 
of the tat protein is not required for cellular uptake. 
A preferred transport polypeptide (or transport moiety) 
consists of amino acids 37-72 of tat protein, and is 

25 called tat37-72 (SEQ ID N0:2). Retention of tat 

residue 37, a cysteine, at the amino terminus of the 
transport polypeptide is preferred, because it is 
useful for chemical cross-linking. 

The advantages of the tatAcys polypeptides, 

30 tat37-72 and other embodiments of this invention 
include the following: 

a) The natural activity of tat protein, 
i.e., induction of HIV transcription, is eliminated; 

b) Dimers, and higher multimers of the 
35 transport polypeptide are avoided; 
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c) The level of expression of tatAcys 
genetic fusions in E. coli may be improved; 

d) Some transport polypeptide conjugates 
display increased solubility and superior ease of 

5 handling; and 

e) Some fusion proteins display increased 
activity by the cargo moiety, as compared with fusions 
containing the cysteine-rich region* 

Numerous chemical cross-linking methods are 
10 known and potentially applicable for conjugating the 
transport polypeptides of this invention to cargo 
macromolecules . Many known chemical cross-linking 
methods are non-specific, i.e., they do not direct the 
point of coupling to any particular site on the 
15 transport polypeptide or cargo macromolecule . As a 
result, use of non-specific cross-linking agents may 
attack functional sites' or sterically block active 
sites, rendering the conjugated proteins biologically 
inactive. 

20 A preferred approach to increasing coupling 

specificity in the practice of this invention is direct 
chemical coupling to a functional group found only once 
or a few times in one or both of the polypeptides to be 
cross-linked. For example, in many proteins, cysteine, 

25 which is the only protein amino acid containing a thiol 
group, occurs only a few times. Also, for example, if 
a polypeptide contains no lysine residues, a cross- 
linking reagent specific for primary amines will be 
selective for the amino terminus of that polypeptide. 

30 Successful utilization of this approach to increase 

coupling specificity requires that the polypeptide have 
the suitably rare and reactive residues in areas of the 
molecule that may be altered without loss of the 
molecule's biological activity. 
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As demonstrated in the examples below, 
cysteine residues may be replaced when they occur in 
parts of a polypeptide sequence where their 
participation in a cross-linking reaction would likely 
5 interfere with biological activity. When a cysteine 
residue is replaced, it is typically desirable to 
minimize resulting changes in polypeptide folding. 
Changes in polypeptide folding are minimized when the 
replacement is chemically and sterically similar to 

10 cysteine. For these reasons, serine is preferred as a 
replacement for cysteine. As demonstrated in the 
examples below, a cysteine residue may be introduced 
into a polypeptide's amino acid sequence for cross- 
linking purposes. When a cysteine residue is 

15 introduced, introduction at or near the amino or 

carboxy terminus is preferred. Conventional methods 
are available for such amino acid sequence 
modifications, whether the polypeptide of interest is 
produced by chemical synthesis or expression of 

2 0 recombinant DNA. 

Cross-linking reagents may be 
homobifunctional, i.e., having two functional groups 
that undergo the same reaction. A preferred 
homobifunctional cross-linking reagent is 

25 bismaleimidohexane ("BMH"). BMH contains two maleimide 
functional groups, which react specifically with 
sulfhydryl-containing compounds under mild conditions 
(pH 6.5-7.7). The two maleimide groups are connected 
by a hydrocarbon chain. Therefore, BMH is useful for 

30 irreversible cross-linking of polypeptides that contain 
cysteine residues. 

Cross-linkxng reagents may also be 
heterobif unctional . Heterobif unct ional cross-linking 
agents, have two different functional groups, for 

35 example an amine-reactive group and a thiol-reactive 
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group, that will cross-link two proteins having free 
amines and thiols, respectively. Examples of 
heterobif unctional cross-linking agents are 
succinimidyl 4- (N-maleimidomethyl) cyclohexane-1- 
5 carboxylate ("SMCC") , m-maleimidobenzoy 1-N- 

hydroxysuccinimide ester ("MBS"), and succinimide 4- 
(p-maleimidophenyl) butyrate ("SMPB"), an extended chain 
analog of MBS. The succinimidyl group of these cross- 
linkers reacts with a primary amine, and the thiol- 

10 reactive maleimide, forms a covalent bond with the 
thiol of a cysteine residue. 

Cross-linking reagents often have low 
solubility in water. A hydrophilic . moiety , such as a 
sulfonate group, may be added to the cross-linking 

15 reagent to improve its water solubility. Sulfo-MBS and 
sulfo-SMCC are examples of cross-linking reagents 
modified for water solubility. 

Many cross-linking reagents yield a conjugate 
that is essentially non-cleavable under cellular 

20 conditions. However, some cross-linking reagents 

contain a covalent bond, such as a disulfide, that is 
cleavable under cellular conditions. For example, 
dithiobis (succinimidylpropionate) ("DSP") , Traut ' s 
reagent and N-succinimidyl 3- (2-pyridyldithio) 

25 propionate ("SPDP") are well-known cleavable cross- 
linkers. The use of a cleavable cross-linking reagent 
permits the cargo moiety to separate from the transport 
polypeptide after delivery into the target cell. 
Direct disulfide linkage may also be useful. 

30 Some new cross-linking reagents such as n-^- 

malei.TTi.idobutyryloxy-succinimide ester ( "GMBS" ) and 
siilfo-GMBS, have reduced immunogenicity . In some 
embodiments of the present invention, such reduced 
immunogenicity may be advantageous. 
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Numerous cross-linking reagents, including 
the ones discussed above, are coirunercially available. 
Detailed instructions for their use are readily 
available from the commercial suppliers, A general 
5 reference on protein cross-linking and conjugate 
preparation is: S.S. Wong, Chemistry of Protein 
Conjug ation and Cross-Linking . CRC Press (1991) . 

Chemical cross-linking may include the use of 
spacer arms. Spacer arms provide intramolecular 

10 flexibility or adjust intramolecular distances between 
conjugated moieties and thereby may help preserve 
biological activity. A spacer arm may be in the form 
of a polypeptide moiety comprising spacer amino acids. 
Alternatively, a spacer arm may be part of the cross- 

15 linking reagent, such as in "long-chain SPDP" (Pierce 
Chem. Co., Rockford, IL, cat. No. 21651 H) . 

The pharmaceutical compositions of this 
invention may be for therapeutic, prophylactic or 
diagnostic applications, and may be in a variety of 

20 forms. These include, for example, solid, semi-solid, 
and liquid dosage forms, such as tablets, pills, 
powders, liquid solutions or suspensions, aerosols, 
liposomes, suppositories, injectable and infusible 
solutions and sustained release forms. The preferred 

25 form depends on the intended mode of administration and 
the therapeutic, prophylactic or diagnostic 
application. The transport polypeptide-cargo molecule 
conjugates of this invention may be administered by 
conventional routes of administration, such as 

30 parenteral, subcutaneous, intravenous, intramuscular, 

intralesional or aerosol routes' The compositions also 
preferably include conventional pharmaceutically 
acceptable carriers and adjuvants that are known to 
those of skill in the art. 
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Generally, the pharmaceutical compositions of 
the present invention may be formulated and 
administered using methods and compositions similar to 
those used for pharmaceutically important polypeptides 
5 such as, for example, alpha interferon. It will be 
understood that conventional doses will vary depending 
upon the particular cargo involved. 

The processes and compositions of this 
invention may be applied to any organism, including 
10 humans. The processes and compositions of this 

invention may also be applied to animals and humans 
in utero . 

For many pharmaceutical applications of this 
invention, it is necessary for the cargo molecule to be 

15 translocated from body fluids into cells of tissues in 
the body, rather than from a growth medium into 
cultured cells. Therefore, in addition to examples 
below involving cultured cells, we have provided 
examples demonstrating delivery of model cargo proteins 

2 0 into cells of various mammalian organs and tissues, 
following intravenous injection of transport 
polypeptide-cargo protein conjugates into live animals. 
These cargo proteins display biological activity 
following delivery into the cells in vivo . 

25 As demonstrated in the examples that follow, 

using the amino acid and DNA sequence information 
provided herein, the transport polypeptides of this 
invention may be chemically synthesized or produced by 
recombinant DNA methods. Methods for chemical 

30 synthesis or recombinant DNA production of polypeptides 
having a known amino acid frequence are well known. 
Automated equipment for polypeptide or DNA synthesis is 
commercially available. Host cells, cloning vectors, 
DNA expression control sequences and oligonucleotide 

35 linkers are also commercially available. 
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Using well-known techniques, one of skill in 
the art can readily make minor additions, deletions or 
substitutions in the preferred transport polypeptide 
amino acid sequences set forth herein. It should be 
5 understood, however, that such variations are within 
the scope of this invention. 

Furthermore, tat proteins from other viruses, 
such as HIV-2 (M. Guyader et al., "Genome Organization 
and Transactivation of the Human Immunodeficiency Virus 
10 Type 2", Nature . 326, pp. 662-669 (1987)), equine 
infectious anemia virus (R. Carroll et'al., 
"Identification of Lentivirus Tat Functional Domains 
Through Generation of Equine Infectious Anemia 
Virus/Human Immunodeficiency Virus Type l tat Gene 
15 Chimeras", J. Virol . . 65, pp. 3460-67 (1991)), and 

simian immunodeficiency virus (L. Chakrabarti et al . , 
"Sequence of Simian Immunodeficiency Virus from Macaque 
and Its Relationship to Other Human and Simian 
Retroviruses", Nature, 328, pp. 543-47 (1987); S.K. 

2 0 Arya et al., "New Human and Simian HIV-Related 

Retroviruses Possess Functional Transactivator (tat) 
Gene", Nature, 328, pp. 548-550 (1987)) are known. It 
should be understood that polypeptides derived from 
those tat proteins and characterized by the presence of 
25 the tat basic region and the absence of the tat 

cysteine-rich region fall within the scope of the 
present invention. 

In order that the invention described herein 
may be more fully understood, the following examples 

3 0 are set forth. It should be understood that these 

examples are for illustrative purposes only and are net 
to be construed as limiting this invention in any 
manner. Throughout these examples, all molecular 
cloning reactions were carried out according to methods 
3 5 in J. Sambrook et al . , Molecular Cloning: A Laboratory 
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Manual, 2nd Edition , Cold Spring Harbor Laboratory 
(1989), except where otherwise noted. 

EXAMPLE 1 

Production and Purification 
5 of Transport Polypeptides 

Recombinant DNA 

Plasmid pTat72 was a starting clone for 
bacterial production of tat-derived transport 
polypeptides and construction of genes encoding 

10 transport polypeptide-cargo protein fusions. We 

obtained plasmid pTat72 (described in Frankel and Pabo, 
supra ) from Alan Frankel (The Whitehead Institute for 
Biomedical Research, Cambridge, MA). Plasmid pTat72, 
was derived from the pET-3a expression vector of F.W. 

15 Studier et al. ("Use of T7 RNA Polymerase to Direct 
Expression of Cloned Genes", Methods Enzvmol. , 185, 
pp. 60-90 (1990)) by insertion of a synthetic gene 
encoding amino acids 1 to 72 of HIV-l tat. The tat 
coding region employs E. coli codon usage and is driven 

20 by the bacteriophage T7 polymerase promoter inducible 
with isopropyl beta-D-thiogalactopyranoside ("IPTG") 
Tat protein constituted 5% of total E. coli protein 
after IPTG induction. 

Purification of Tatl-72 from Bacteria 

25 We suspended E. coli expressing tatl-72 

protein in 10 volumes of 25 mM Tris-HCl (pH 7.5), 1 m>'i 
EDTA. We lysed the cells in a French press and removed 
the insoluble debris by cent r if ugat ion at 10,000 x g 
for 1 hour. We loaded the supernatant onto a Q 

30 Sepharose Fast Flow (Pharmacia LKB, Piscataway, NJ) ion 
exchange column (2.0 ml resin/60 ml lysate) . We treated 
the flow-through fraction with 0.5 M NaCl, which caused 
the tat protein to precipitate. We collected the salr- 
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precipitated protein by centrif ugation at 35,000 rpm, 
in a 50.2 rotor, for 1 hour. We dissolved the pelleted 
precipitate in 6 M guanidine-HCl and clarified the 
solution by centrif ugation at 35,000 rpm, in a 50.2 
5 rotor, for l hour. We loaded the clarified sample onto 
an A. 5 agarose gel filtration column equilibrated with 
6 M guanidine-HCl, 50 mM sodium phosphate (pH 5.4), 
10 mM DTT, and then eluted the sample with the same 
buffer. We loaded the tat protein-contain gel 

10 filtration fractions onto a reverse phase HPLC 
column and eluted with a gradient of 0-75% 
acetonitrile, 0.1% trif luoroacetic acid. Using this 
procedure, we produced about 20 mg of tatl-72 protein 
per liter of E. coli culture (assuming 6 g of cells per 

15 liter) . This represented an overall yield of about 
50%. 

Upon SDS-PAGE analysis, the tatl-72 
polypeptide migrated as a single band of 10 kD. The 
purified tatl-72 polypeptide was active in an 

20 uptake/transactivation assay. We added the polypeptide 
to the culture medium of human hepatoma cells 
containing a tat-responsive tissue plasminogen 
activator ("tPA") reporter gene. In the presence of 
0.1 mM chloroquine, the purified tatl-72 protein 

25 (100 ng/ml) induced tPA expression approximately 150- 
fold. 

Chemical Synthesis of Transport Polypeptides 

For chemical synthesis of the various 
transport polypeptides, we used a commercially- 
3 0 available, automated system, r Applied Eiosystems Model 
430A synthesizer) and followed the system 
manufacturer's recommended procedures. We removed 
blocking groups by HF treatment and isolated the 
synthetic polypeptides by conventional reverse phase 
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HPLC methods. The integrity of all synthetic 
polypeptides was confirmed by mass spectrometer 
analysis. 

EXAMPLE 2 

5 B-Galactosidase Conjugates 

Chemical Cross-Linking with SMCC 

For acetylation of 5-galactosidase (to block 
cysteine sulfhydryl groups) we dissolved 6 . 4 mg of 
commercially obtained 5-galactosidase (Pierce Chem. 

10 Co., cat. no. 32101G) in 200 ^1 of 50 mN phosphate 
buffer (pH 7.5). To the 200 ^1 of l^-galactosidase 
solution, we added 10 m1 of iodoacetic acid, prepared 
by dissolving 30 mg of iodoacetic acid in 4 ml of 50 mM 
phosphate buffer (pH 7.5). (In subsequent experiments 

15 we found iodoacetamide to be a preferable substitute 
for iodoacetic acid.) We allowed the reaction to 
proceed for 60 minutes at room temperature. We then 
separated the acetylated J^-galactosidase from the 
unreacted iodoacetic acid by loading the reaction 

20 (Pharmacia) mixture on a small G-25 (Pharmacia LKB, 
Piscataway, NJ) gel filtration column and collecting 
the void volume. 

Prior to SMCC activation of the amine groups 
of the acetylated 5-galactosidase, we concentrated 2 ml 

25 of the enzyme collected from the G-25 column to 0.3 ml 
in a Centricon 10 (Amicon, Danvers, MA) ultrafiltration 
apparatus. To the concentrated acetylated 
B-galactosidase , we added 19 Mg of sulfo-SMCC (Pierce 
Chem. Co., cat. no. 22322G) dissolved in 15 fil of 

30 dimethylf ormam.ide ("DMF"). We allowed the reaction to 
proceed for 30 minutes at room temperature. We then 
separated the /i-galactosidase-SMCC from the DMF and 
unreacted SMCC by passage over a small G-25 gel 
filtration column. 
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For chemical cross-linking of transport 
polypeptides to B-galactosidase, we mixed -the solution 
of 3-galactosidase-SMCC with 100 fig of transport 
polypeptide (tatl-72, tat37-72, tat38-58GGC, tat37-58, 
5 tat47-58GGC or tatCGG47-58) dissolved in 200 m1 of 
50 itiM phosphate buffer (pH 7.5). We allowed the 
reaction to proceed for 60 minutes at room temperature. 
We then isolated the transport polypeptide-13- 
galactosidase conjugate by loading the reaction mixture 
10 on an S-200HR gel filtration column and collecting the 
void volume. 

The transport polypeptide-B-galactosidase 
conjugate thus obtained yielded positive results when 
assayed for tat in conventional Western blot and ELISA 

15 analyses performed with rabbit anti-tat polyclonal 

antibodies. For a general discussion of Western blot 
and ELISA analysis, see E. Harlow and D. Lane, 
Antibodies: A Laboratory Manual . Cold Spring Harbor 
Laboratory (1988). Gel filtration analysis with 

20 Superose 6 (Pharmacia LKB, Piscataway, NJ) indicated 

the transport polypeptide-ft-galactosidase conjugate to 
have a molecular weight of about 540,000 daltons. 
Specific activity of the transport polypeptide- 
B-galactosidase conjugate was 52% of the specific 

25 activity of the B-galactosidase starting material, when 
assayed with o-nitropheny 1-3-D-galactopyranoside 
("ONPG"). The ONPG assay procedure is described in 
detail at pages 16.66-16.67 of Sambrook et al. ( supra ) , 

Cellular U ptake of B-Galactosidase Conjugates 

We added the conjugates to the medium of HeLa 
cells (ATCC no. CCL2) at 20. Atg/ml, in the presence or 
absence of 100 chloroquine. We incubated the cells 
for 4-18 hours at 37=»C/5.5% CO2 • We fixed the cells 
with 2% formaldehyde, 0.2% glutaraldehyde in phosphate- 



WO 94/04686 



wo 94/04686 



PCT/US93/07833 



buffered saline ("PBS") for 5' minutes at 4°C. We then 
washed the cells three times with 2 mM MgCl2 in PBS, 
and stained them with X-gal, at 37 ^C. X-gal is a 
colorless 5-galactosidase substrate ( 5-bromo-4-chloro- 
5 3-indolyl D-galactoside) that yields a blue product 
upon cleavage by B-galactosidase . Our X-gal staining 
solution contained 1 mg of X-gal (Bio-Rad, Richmond, 
CA, cat. no. 170-3455) per ml of PBS containing 5 mM 
potassium f erricyanide, 5 mM potassium ferrocyanide and 

10 2 mM MgCl2- 

We subjected the stained cells to microscopic 
examination at magnifications up to 4 00 X. Such 
microscopic examination revealed nuclear staining, as 
well as cytoplasmic staining . 

15 The cells to which the tat37-72-6- 

galactosidase conjugate or tat 1-72-3-galactosidase 
conjugate was added stained dark blue. 5-galactosidase 
activity could be seen after a development time as 
short as 15 minutes. For comparison, it should be 

2 0 noted that stain development time of at least 6 hours 
is normally required when ft-galactosidase activity is 
introduced into cells by means of transfection of the 
B-galactosidase gene. Nuclear staining was visible in 
the absence of chloroquine, although the nuclear 

25 staining intensity was slightly greater in chloroquine- 
treated cells. Control cells treated with unconjugated 
/3>-galactosidase showed no detectable staining, 

Cleavable Conjugation by Direct Disulfide 

Each B-galactosidase tetramer has 12 cysteine 
30 residues that may be used for direct disulfide linkage 
to a transport polypeptide cysteine residue. To reduce 
and then protect the sulfhydryl of tat37-72, we 
dissolved 1.8 mg (411 nmoles) of tat37-72 in 1 ml of 50 
mM sodium phosphate (pH 8.0), 150 mM NaCl, 2mM EDTA, 



wo 94/04686 



PCT/yS93/07833 



- 27 - 

and applied the solution to a Reduce-Imm column (Pierce 
Chem. Co., Rockford, IL) . After 3 0 minutes at room 
temperature, we eluted the tat37-72 from the column 
with 1 ml aliquots of the same buffer, into tubes 
5 containing o.l ml of 10 mM 5 , 5 ' -dithio-bis ( 2- 

nitrobenzoic acid) ("DTNB"), We left the reduced 
tat37-72 polypeptide in the presence of the DTNB for 3 
hours. We then removed the unreacted DTNB from the 
tat37-72-TNB by gel filtration on a 9 ml Sephadex G-10 

10 column (Pharmacia LKB, Piscataway, NJ) . We dissolved 
5 mg 5-galactosidase in 0 . 5 ml of buffer and desalted 
it on a 9 ml Sephadex G-25 column (Pharmacia LKB, 
Piscataway, NJ) , to obtain 3.8 mg of 3-galactosidase/ml 
buffer. We mixed 0.5 ml aliquots of desalted 

15 B-galactosidase solution with 0.25 or 0.5 ml of the 
tat37-72"TNB preparation, and allowed the direct 
disulfide cross-linking reaction to proceed at room 
temperature for 3 0 minutes. We removed the unreacted 
tat37-72-TNB from the 5-galactos idase conjugate by gel 

20 filtration on a 9 ml Sephacryl S-200 column. We 

monitored the extent of the cross-linking reaction 
indirectly, by measuring absorbance at 412 nm due to 
the released TNB . The direct disulfide conjugates thus 
produced were taken up into cells (data not shown) . 

2 5 Cleavable Conjugation with SPDP 

We used the heterobif unctional cross-linking 

reagent ("SPDP"), which contains a cleavable disulfide 

bond, to form a cross-link between: (1) the primary 

amine groups of B-galactosidase and the cysteine 

30 sulfhydryls of tatl-72 (metabolically labelled with 
35 

S) ; or (2) the primary amine groups of rhodamine- 
labelled 5-galactosidase and the amino terminal 
cysteine sulfhydryl of tat37-72. 



wo 94/04686 



PCT/US93/07833 



For the t:atl-72 conjugation, we dissolved 
5 mg of 5-gaiactosidase in 0 . 5 ml of 5 0 mM sodium 
phosphate (pH 7.5), 150 mM NaCl , 2 mM MgCl2, and 
desalted the 3-galactosidase on a 9 ml Sephadex G-25 
5 column (Pharmacia LKB, Piscataway, NJ) . We treated the 
desalted ft-galactosidase with an 88-fold molar excess 
of iodoacetamide at room temperature for 2 hours, to 
block free sulfhydryl groups. After removing the 
unreacted iodoacetamide by gel filtration, we treated 

10 the blocked 3-galactosidase with a 10-fold molar excess 
of SPDP at room temperature. After 2 hours, we 
exchanged the buffer, by ultrafiltration (Ultrafree 30, 
Millipore, Bedford, MA) . We then added a 4-fold molar 
excess of labelled tatl-72, and allowed the cross- 

15 linking reaction to proceed overnight, at room 

temperature. We removed the unreacted tatl-72 by gel 
filtration on a 9 ml Sephacryl S-200 column. Using the 
known specific activity of the labelled tatl-72, we 
calculated that there were 1.1 tatl-72 polypeptides 

20 cross-linked per B-galactosidase tetramer. Using the 
ONPG assay, we found that the conjugated 
B-galactosidase retained 100% of irs enzymatic 
activity. Using measurement of cell-incorporated 
radioactivity and X-gal staining , we demonstrated 

25 uptake of the conjugate into cultured HeLa cells. 

For the tat37-72 conjugation, our procedure 
was as described in the preceding paragraph, except 
that we labelled the B-galactosidase with a 5:1 molar 
ratio of rhodamine maleimide at room temperature for 1 

30 hour, prior to the iodoacetamide treatment (100:1 
iodoacetamide molar excess) . In the cross-linking 
reaction, we used an SPDP ratio of 20:1, and a tat37- 
72 ratio of 10:1. We estimated the conjugated product 
to have about 5 rhodamine molecules (according to U\^ 

35 absorbance) and about 2 tat37-72 moieties (according to 
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gel filtration) per 5-galactosidase tetramer. The 
conjugate from this procedure retained about 3 5% of the 
initial 3-galactosidase enzymatic activity. Using X- 
gal staining and rhodamine fluorescence, we 
5 demonstrated that the SPDP conjugate was taken up into 
cultured HeLa cells. 

EXAMPLE 3 

Animal Studies with 
B-Galactosidase Conjugates 

For conjugate half-life determination and 
biodistribution analysis, we injected either 200 fig of 
SMCC-5-galactosidase (control) or tatl-72- 
/^►-galactosidase intravenously ("IV") into the tail 
veins of Balb/c mice (Jackson Laboratories) , with and 

15 without chloroquine. We collected blood samples at 
intervals up to 30 minutes. After 30 minutes, we 
sacrificed the animals and removed organs and tissues 
for histochemical analysis. 

We measured B-galactosidase activity in blood 

2 0 samples by the ONPG assay. The ONPG assay procedure is 
described in detail at pages 16.66-16.67 of Sambrook 
et al. (supra) . 5-galactosidase and tatl-72- 
^-galactosidase were rapidly cleared from the 
bloodstream. We estimated their half-lives at 3-6 

25 minutes. These experimental comparisons indicated that 
attachment of the tatl-72 transport polypeptide has 
little or no effect on the clearance rate of 
3-galactosidase from the blood. 

To detect cellular uptake of the transport 

30 polypeptide-5-galactosidase conjugates, we prepared 
thin frozen tissue sections from sacrificed animals 
(above) , carried out fixation as described in Example 2 
(above) , and subjected them to a standard X-gal 
staining procedure. Liver, spleen and heart stained 
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intensely. Lung, and skeletal muscle stained less 
intensely. Brain, pancreas and kidney showed no 
detectable staining. High power microscopic 
examination revealed strong cellular, and in some 
5 cases, nuclear staining of what appeared to be 

endothelial cells surrounding the blood supply to the 
tissues . 

EXAMPLE 4 

Cellular Uptake Tests with B-Galactosidase-Polvarqinine 
10 and B-Galactosidase-Polvlvsine Coniuaates 

To compare the effectiveness of simple basic 
amino acid polymers with the effectiveness of our tat- 
derived transport polypeptides, we conjugated 
commercially available polyarginine (Sigma Chem Co. , 

15 St. Louis, MO, cat. no. P-4663) and polylysine (Sigma 
cat. no. P-2658) to B-galactosidase , as described in 
Example 2, above. We added the conjugates to the 
medium of HeLa cells at 1-30 A^g/ml, with and without 
chloroquine. Following incubation with the conjugates, 

2 0 we fixed, stained and microscopically examined the 
cells as described in Example 2, above. 

The polylysine-5-gaiactosidase conjugate gave 
low levels of surface staining and no nuclear staining. 
The polyarginine-3-galactosidase conjugate gave intense 

25 overall staining, but showed less nuclear stain than 
the tatl-72-5-galactosidase and tat37-72- 
B-galactosidase conjugates. To distinguish between 
cell surface binding and actual internalization of the 
polyarginine-B-galactosidase conjugate, we treated the 

30 cells v;ith trypsin, a protease, prior to the fixing and 
staining procedures. Trypsin treatment eliminated most 
of the X-gal staining of polyarginine-fi>-galactosidase 
treated cells, indicating that the polyarginine- 
B-galactosidase conjugate was bound to the outside 
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surfaces of the cells rather than actually 
internalized. In contrast, cells exposed to the tatl- 
72 or 37-72-5-galactosidase conjugates stained despite 
trypsin treatment, indicating that the fi-galactosidase 
5 cargo was inside the cells and thus protected from 
trypsin digestion. Control cells treated with 
unconjugated fi-galactosidase showed no detectable 
staining. 

EXAMPLE 5 

10 Horseradish Peroxidase Conjugates 

Chemical Cross-Linking 

To produce tatl-72-HRP and tat37-72-HRP 
conjugates, we used a commercially-available HRP 
coupling kit (Immunopure maleimide activated HRP, 

15 Pierce Chem. Co., cat. no. 31498G). The HRP supplied 
in the kit is in a form that is selectively reactive 
toward free -SH groups. (Cysteine is the only one of 
the 20 protein amino acids having a free -SH group.) 
In a transport polypeptide-HRP conjugation experiment 

20 involving tatl-72, we produced the tatl-72 starting 
material in £. coli and purified it by HPLC, as 
described in Example 1, above. We lyophilized 200 
of the purified tatl-72 (which was dissolved in 
TFA/acetonitrile) and redissolved it in 100 ^1 of 

25 100 mM HEPES buffer (pH7.5), 0.5 mM EDTA. We added 

50 111 of the tatl-72 or tat37-72 solution to 50 fil of 
Immunopure HRP (750 jug of the enzyme) in 250 mM 
triethanolamine (pH 8.2). We allowed the reaction to 
proceed for 80 minutes, ar room temperature. Under 

30 these conditions, apprcxiip.ace ly 70% of the HRP v;a£ 

chemically linked to tatl-72 molecules. We monitored 
the extent of the linking reaction by SDS-PAGE 
analysis . 
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Cellular Uptake of HRP Conjugates 

V?e added the conjugates to the medium of HeLa 
cells at 20 ^g/ml, in the presence or absence of 100 
chloroquine. We incubated the cells for 4-18 hours at 
5 37«C/5.5% C02- We developed the HRP stain using 4- 

chloro-l-naphthol (Bio-Rad, Richmond, CA, cat. no. 170- 
6431) and hydrogen peroxide HRP substrate. In 
subsequent experiments, we substituted diaminobenzidine 
(Sigma Chem. Co., St. Louis, MO) for 4-chloro-l- 
10 naphthol. 

Cells to which we added transport 
polypeptide-HRP conjugates displayed cell-associated 
HRP activity. Short time periods of conjugate exposure 
resulted in staining patterns which appeared punctate, 
15 probably reflecting HRP in endocytic vesicles. 

Following longer incubations, we observed diffuse 
nuclear and cytoplasmic staining. Control cells 
treated with unconjugated HRP showed no detectable 
staining. 

20 EXAMPLE 6 

P£ ADP Ribosvlation Domain Conjugates 

We cloned and expressed in £. coli the 
Pseudomonas exotoxin ("PE") both in its full length 
form and in the form of its ADP ribosylation domain. 
25 We produced transport polypeptide-PE conjugates both by 
genetic fusion and chemical cross-linking. 

Plasmid Construction 

To construct plasnid pTat7 0 ( Apal ) , we 
inserted a unique Apal site into the tat open reading 
30 frame by digesting pTat72 with BamHl and EcoRl , and 
inserting a double-stranded linker consisting of the 
following synthetic oligonucleotides: 
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GATCCCAGAC CCACCAGGTT TCTCTGTCGG GCCCTTAAG (SEQ 
ID NO:S) 

AATTCTTAAG GGCCCGACAG AGAAACCTGG TGGGTCTGG (SEQ 
ID NO: 9) . 

5 The linker replaced the C-terminus of tat, LysGlnStop, 
with GlyProStop. The linker also added a unique Apal 
site suitable for in-frame fusion of the tat sequence 
with the PE ADP ribosylation domain-encodinq sequences, 
by means of the naturally-occurring Apal site in the PE 

10 sequence. To construct plasmid pTatTOPE (SEQ ID 

NO: 10), we removed an Apal-EcoRI fragment encoding the 
PE ADP ribosylation domain, from plasmid CD4(181)- 
PE(392) . The construction of CD4 (181) -PE{392) is 
described by G. Winkler et al. ( " CD4 - Pseudomonas - 

15 Exotoxin Hybrid Proteins: Modulation of Potency and 
Therapeutic Window Through Structural Design and 
Characterization of Cell Internalization", "aids 
Resear ch and Human Retroviruses . 7, pp, 393-401 
(1991)). We inserted the Apal-EcoRI fragment into 

20 pTat70(ApaI) digested with Apal and EcoRl . 

To construct plasmid pTatSPE (SEQ ID NO: 11), 
we removed a 2 14 -base pair Ndel-Apal fragment from 
pTatVOPE and replaced it with a double-stranded linker 
having Ndel and Apal cohesive termini, encoding tat 

25 residues 1-4 and 67-70, and consisting of the following 
synthetic oligonucleotides : 

TATGGAACCG GTCGTTTCTC TGTCGGGCC (SEQ ID NO: 12) 
CGACAGAGAA ACGACCGGTT CCA (SEQ ID NO: 13). 

Purification of TAT8-PE 

Expression of the pTat8-PE construct yielded 
the PE ADP ribosylation domain polypeptide fused to 
amino acids 1-4 and 67-70 of tat protein. The pTatS- 
PE expression product ("tatS-PE") served as the PE ADP 
ribosylation domain moiety (and the unconjugated 
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control) in chemical cross-linking experiments 
described below. Codons for the 8 tat amino acids were 
artifacts from a cloning procedure selected for 
convenience. The 8 tat amino acids fused to the PE ADP 
5 ribosylation domain had no transport activity 
(Figure 2) • 

For purification of tat8-PE, we suspended 4.5 
g of pTatS-PE-transf ormed E. coli in 2 0 ml of 50 mM 
Tris-HCl (pH 8.0), 2mM EDTA. We lysed the cells in a 

10 French press and removed insoluble debris by 

centrif ugation at 10,000 rpm for 1 hour, in an SA600 
rotor. Most of the tat8-PE was in the supernatant.. We 
loaded the supernatant onto a 3 ml Q-Sepharose Fast 
Flow (Pharmacia LKB, Piscataway, NJ) ion exchange 

15 column. After loading the sample, we washed the column 
with 50 mM Tris-HCl (pH 8.0), 2 mM EDTA. After washing 
the column, we carried out step gradient elution, using 
the same buffer with 100, 200 and 400 mM NaCl. The 
tat8-PE eluted with 200 mM NaCl . Following the ion 

20 exchange chromatography, we further purified the tatS- 
PE by gel filtration on a Superdex 75 FPLC column 
(Pharmacia LKB, Piscataway, NJ) . We equilibrated rhe 
gel filtration column with 50 mM HEPES (pH 7.5). We 
then loaded the sample and carried out elution with the 

25 equilibration buffer at 0.34 ml/min. We collected 1.5- 
minute fractions and stored the tatS-PE fractions ar - 
70^C. 

Crosslinking of TAT8-PE 

Since the PE ADP ribosylation domain has no 
30 cysteine residues, v/e used sulfo-SMCC (Pierce Cher.. 
Co., Rockford, IL cat no. 22322 G) for transport 
polypeptide-tat8-PE conjugation. We carried out the 
conjugation in a 2-step reaction procedure. In the 
first reaction step, we treated tat8-PE (3 mg/ml), in 
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50 HiM HEPES (pH 1.5), with 10 itiM sulfo-SMCC, at room 
temperature, for 4 0 minutes. (The sulfo-SMCC was added 
to the reaction as a 100 mM stock solution in 1 M 
HEPES, pH 7.5.) We separated the tat8-PE-sulf o-SMCC 
5 from the unreacted sulfo-SMCC by gel filtration on a 
P6DG column (Bio-Rad, Richmond, CA) equilibrated with 
25 mM HEPES (pH 6.0), 25 mM NaCl. In the second 
reaction step, we allowed the tat8-PE-sulf o-SMCC (1.5 
mg/ml 100 mM HEPES (pH 7.5), 1 mM EDTA) to react with 

10 purified tat37-72 (600 mM final cone.) at room 

temperature, for 1 hour. To stop the cross-linking 
reaction, we added cysteine. We analyzed the cros's- 
linking reaction products by SDS-PAGE. About 90% of 
the tat8-PE became cross-linked to the tat37-72 

15 transport polypeptide under these conditions. 

Approximately half of the conjugated product had one 
transport polypeptide moiety, and half had two 
transport polypeptide moieties. 

Cell-Free Assay for PE ADP Ribosvlation 

20 To verify that the PE ribosylation domain 

retained its biological activity (i.e., destructive 
ribosome modification) following conjugation to 
transport polypeptides, we tested the effect of 
transport polypeptide-PE ADP ribosylation conjugates on 

25 in vitro (i.e., cell-free) translation. For each 

in vitro translation experiment, we made up a fresh 
translation cocktail and kept it on ice. The in vitro 
translation cocktail contained 200 ^1 rabbit 
reticulocyte lysate (Promega, Madison, WI), 2 m1 10 mM 

30 ZnCl2 (optional) , 4 ^1 of a mixture of the 20 protein 
amino acids except methionine, and 20 ^xl '^^s- 
methionine. To 9 ^1 of translation cocktail we added 
from 1 to 1000 ng of transport polypeptide-PE conjugate 
(preferably in a volume of 1 /il) or control, and pre- 
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incubated the mixture for 60 minutes at 3 0*'C. We then 
added 0*5 m1 BMV F^NA to each sample and incubated for 
an additional 60 minutes at BO^'C. We stored the 
samples at -70^0 after adding 5 fil of 50% glycerol per 
5 sample. We analyzed the in vitro translation reaction 
products by SDS-PAGE techniques • We loaded 2 ^1 of 
each translation reaction mixture (plus an appropriate 
volume of SDS-PAGE sample buffer) per lane on the SDS 
gels. After electrophoresis, we visualized the ^^S- 

10 containing in vitro translation products by 
f luorography . 

Using the procedure described in the 
preceding paragraph, we found that the PE ADP 
ribosylation domain genetically fused to the tatl-70 

15 transport polypeptide had no biological activity, i.e., 
did not inhibit in vitro translation. In contrast, 
using the same procedure, we found that the PE ADP 
ribosylation domain chemically cross-linked to the 
'tat37-72 transport polypeptide had retained full 

20 biological activity, i.e., inhibited in vitro 

translation as well as the non-conjugated PE ADP 
ribosylation domain controls (Figure 2) . 

Cytotoxici ty Assay for PE ADP Ribosylation 

In a further test involving the tat37-72-PE 
25 ADP ribosylation domain conjugate, we added it to 
cultured HeLa cells in the presence or absence of 
100 chloroquine. We then assayed cytotoxicity by 
measuring in vivo protein synthesis, as indicated by 
trichloroacetic acid ( "TCA" ) -precipitable radioactivity 
30 in cell extracts. 

We performed the cytotoxicity assay as 
follows. We disrupted HeLa cell layers, centrifuged 
the cells and resuspended them at a density of 
2.5 X lo'^/ml of medium. We used 0.5 ml of 
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suspension/well when using 24 well plates, or 0.25 ml 
of suspension/well when using 48 well plates. We added 
conjugates or unconjugated controls, dissolved in 
100 Ml of PBS, to the wells after allowing the cells to 
5 settle for at least 4 hours. We incubated the cells in 
the presence of conjugates or controls for 60 minutes, 
at 37 «c, then added 0.5 ml of fresh medium to each 
cell, and incubated the cells for an additional 5-24 
hours. Following this incubation, we removed the 
10 medium from each well and washed the cells once with 

about 0.5 ml PBS. We then added l ^Ci of 

35 

S-methionine (Amersham) per 100 ul per well in vivo 
cell labelling grade SJ.1015), and incubated the cells 
for 2 hours. After two hours, we removed the 

15 radioactive medium and washed the cells 3 times with 

cold 5% TCA and then once with PBS. We added 100 m1 of 
0.5 M NaOH to each well and allowed at least 4 5 minutes 
for cell lysis and protein dissolving to take place. 
We then added 50 a^I 1 M HCl to each well and 

20 transferred the entire contents of each well into 
scintillation fluid for liquid scintillation 
measurement of radioactivity. 

In the absence of chloroquine, there was a 
clear dose-dependent inhibition of cellular protein 

25 synthesis in response to treatment with the transport 
polypeptide-PE ADP ribosylation domain conjugate, but 
not in response to treatment with the unconjugated PE 
ADP ribosylation domain. The results are summarized in 
Figure 2. When conjugated to tat37-72, the PE ADP 

30 ribosylation domain appeared to be transported 3 to 10- 
fold more efficiently than when conjugated to tatl-72. 
We also conjugated transport polypeptides tat38"58GGC, 
tat37-58, tat47-58GGC and tatCGG-47-58 to the PE ADP 
ribosylation domain. All of these conjugates resulted 
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in cellular uptake of biologically active PE ADP 
ribosylation domain (data not shown) . 

EXAMPLE 7 

Ribonuc lease Conjugates 

5 Chemical Cross-Linkina 

We dissolved 7.2 mg of bovine pancreatic 
ribonuclease A, Type 12A (Sigma Chem. Co., St. Louis, 
MO, cat. no. R5500) in 200 Ail PBS (pH 7.5). To the 
ribonuclease solution, we added 1.4 mg sulfo-SMCC 

10 (Pierce Chem. Co., Rockford, XL, cat. no. 22322H) . 

After vortex mixing, we allowed the reaction to proceed 
at room temperature for 1 hour. We removed unreacted 
SMCC from the ribonuclease-SMCC by passing the reaction 
mixture over a 9 ml P6DG column (Bio-Rad, Richmond, CA) 

15 and collecting 0.5 ml fractions. We identified the 
void volume peak fractions (containing the 
ribonuclease-SMCC conjugate) by monitoring UV 
absorbance at 280 nm. We divided the pooled 
ribonuclease-SMCC-containing fractions into 5 equal 
- 20 aliquots. To each of 4 ribonuclease-SMCC aliquots, we 
added a chemically-synthesized transport polypeptide 
corresponding to tat residues: 37-72 ("37-72") ; 38-58 
plus GGC at the carboxy terminal ("38-58GGC") ; 37-58 
("CGG37-58") ; or 47-58 plus CGG at the amino terminal 

25 ("CGG47-58" ) . We allowed the transport polypeptide- 
ribonuclease conjugation reactions to proceed for 2 
hours at room temperature, and then overnight at 4°C. 
We analyzed the reaction products by SDS-PAGE on a 10- 
20% gradient gel. The cross-linking efficiency was 

30 approximately 60% for transport polypeptides tat38- 

58GGC, tat37-58 and tatCGG47-5S, and 40% for tat37-72. 
Of the modified species, 72% contained one, and 25% 
contained 2 transport polypeptide substitutions. 
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Cellular Uptake of Ta1:37-72-Ribonuclease Conjugates 

We maintained cells at 37 °C in a tissue 
culture incubator in Dulbecco's Modified Eagle Medium 
supplemented with 10% donor calf serum and 
5 penicillium/streptomycin. For cellular uptake assays, 
we plated 10^ cells in a 24-well plate and cultured 
them overnight. We washed the cells with Dulbecco's 
PBS and added the ribonuclease conjugate dissolved in 
300 /xl of PBS containing 80 ;iM chloroguine, at 

10 concentrations of 0, 10, 20, 40 and 80 /xg/ml- After a 
1.25 hour incubation at 37 °C, we added 750 fil of growth 
medium and further incubated the cell samples 
overnight. After the overnight incubation, we washed 
the cells once with PBS and incubated them for 1 hour 

15 in Minimal Essential Medium without methionine (Flow 
Labs) (250 Ml/well) containing ^^s methionine 
(1 MCi/well) , After the 1 hour incubation with 
radioactive methionine, we removed the medium and 
washed the cells three times 5% TCA (1 ml/well/ wash ) . 

20 We then added 250 ^1 of 0.5 M NaOH per well. After 

1 hour at room temperature, we pipetted 200 fil of the 
contents of each well into a scintillation vial, added 
100 Ml of 1 M HCl and 4 ml of scintillation fluid. 
After thorough mixing of the contents of each vial, we 

25 measured radioactivity in each sample by liquid 
scintillation counting . 

The cellular uptake results are summarized in 
Figure 3. Transport polypeptide tat38-58GGC functioned 
as well as, or slightly better than tat37-72. 

30 Transport polypeptide tatCGG47-58 had reduced activity 
(data not shown) . We do not know whether this 
polypeptide had reduced uptake activity or whether the 
proximity of the basic region to the ribonuclease 
interfered with enzyme activity. 
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We have used cation exchange chromatography 
(BioCAD perfusion chromatography system, PerSeptive 
Biosystems) to purify ribonuclease conjugates having 
one or two transport polypeptide moieties. 

5 EXAMPLE 8 

Protein Kinase A Inhibitor Conjugates 

Chemical Cross -Linking 

We purchased the protein kinase A inhibitor 
("PKAI") peptide (20 amino acids) from Bachem 

10 California (Torrence, CA) , For chemical cross-linking 
of PKAI to transport polypeptides, we used either 
sulfo-MBS (at 10 mM) or sulfo-SMPB (at 15 mM) . Both of 
these cross-linking reagents are heterobif unctional for 
thiol groups and primary amine groups. Since PKAI 

15 lacks lysine and cysteine residues, both sulfo-MBS and 
sulfo-SMPB selectively target cross-linking to the 
amino terminus of PKAI. We reacted PPLAI at a 
concentration of 2 mg/ml, in the presence of 50 mM 
HEPES (pH 7.5), 25 mM NaCl, at room temperature, for 50 

20 minutes, with either cross-linking reagent. The sulfo- 
MBS reaction mixture contained 10 mM sulfo-MBS and 20% 
DMF. The sulfo-SMPB reaction mixture contained 15 mM 
sulfo-SMPB and 20% dimethylsulf oxide ("DMSO") . We 
purified the PKAI-cross-linker adducts by reverse phase 

25 HPLC, using a C^ column. We eluted the samples from 
the C^ column in a 20-75% acetonitrile gradient 
containing 0.1% trif luoroacetic acid. We removed the 
acetonitrile and trif luoroacetic acid from the adducts 
by lyophilization and redissolved them in 2 5 mM HEPES 

30 (pH 6.0), 25 mM NaCl. We added tatl-72 or tat37-72 and 
adjusted the pH of the reaction mixture to 7 . 5 , by 
adding 1 M HEPES (pH 7.5) to 100 mM. We then allowed 
the cross-linking reaction to proceed at room 
temperature for 60 minutes. 
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We regulated the extent of cross-linking by 
altering the transport polypeptide : PKAI ratio. We 
analyzed the cross-linking reaction products by SDS- 
PAGE. With tat3 7-72, a single new electrophoretic band 
formed in the cross-linking reactions. This result was 
consistent with the addition of a single tat37-72 
molecule to a single PKAI molecule. With tatl-72, six 
new products formed in the cross-linking reactions. 
This result is consistent with the addition of multiple 
PKAI molecules per tatl-72 polypeptide, as ^ result of 
the multiple cysteine residues in tatl-72. When we 
added PKAI to the cross-linking reaction in large molar 
excess, we obtained only conjugates containing 5 or 6 
PKAI moieties per tatl-72. 

In Vitro Phosphorylati on Assay for PKAI Activii-y 

To test the sulf o-MBS-cross-linked conjugates 
for retention of PKAI biological activity, we used an 
in vitro phosphorylation assay. in this assay, histone 
V served as the substrate for phosphorylation by 
protein kinase A in the presence or absence of PKAI (or 
a PKAI conjugate) . We then used SDS-PAGE to monitor 
PKAI-dependent differences in the extent of 
phosphorylation. In each reaction, we incubated 5 
units of the catalytic subunit of protein kinase A 
Sigma) with varying amounts of PKAI or PKAI conjugate, 
at 37 "C, for 30 minutes. The assay reaction mixture 
contained 24 mM sodium acetate (pH 6.0), 25 mM MgCl2, 
100 mM DTT, 50 /iCi of [K-^^P]ATP and 2 ixg of histone V, 
in a total reaction volume of 4 0 ^ll . Using this assay, 
we found that PKAI conjugated ro t.at:-72 or tat37-72 
inhibited phosphorylation as well as unconjugated PKAI 
(data not shown) . 
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Cellular Assay 

To test for cellular uptake of PKAI and 
transport polypeptide-PKAI conjugates, we employed 
cultured cells containing a chloramphenicol 
5 acetyltransf erase ("CAT") reporter gene under the 
control of a cAMP-reisponsive expression control 
sequence. We thus quantified protein kinase A activity 
indirectly, by measuring CAT activity. This assay has 
been described in detail by J. R. Grove et al. 

10 ("Probind cAMP-Related Gene Expression with a 

Recombinant Protein Kinase Inhibitor", Molecular 
Aspects of Ce llular Regulation, Vol, 6 . P. Cohen and J. 
G. Folkes, eds., Elsevier Scientific, Amsterdam, 
pp. 173-95 (1991) ) . 

i5 Using this assay, we found no activity by 

PKAI or any of the transport polypeptide-PKAI 
conjugates. This result suggested to us that the PKAI 
moiety might be undergoing rapid degradation upon entry 
into the cells. 

2 0 Cross-Linking of PKAI to Tat37-72-fi-Galactosidase 

We had previously found cellular uptake of 
tat37-72-B-galactosidase to be chloroquine-independent 
(Example 2, above). Therefore, we cross-linked PKAI to 
tat37-72-3-galactosidase for possible protection of 
25 PKAI against rapid degradation. 

We treated ft-galactosidase with 20 mM DTT (a 
reducing agent) at room temperature for 30 minutes and 
then removed the DTT by gel filtration on a G50 column 
in MES buffer (pH 5) . We allowed the reduced 

3 0 /i-galactosidase to react v;ith SMPB-activated PKAI 

(above), at pH 6.5, for 60 minutes. To block residual 
free sulfhydryl groups, we added N-ethylmaleimide or 
iodoacetamide. SDS-PAGE analysis showed that at least 
95% of the B-galactosidase had been conjugated. About 
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90% of the conjugated beta-galactosidase product 
contained one PKAI moiety per subunit, and about 10% 
contained 2 PKAI moieties. We treated the PKAI-6- 
galactosidase conjugate with a 10-fold molar excess of 
sulfo-SMCC. We then reacted the PKAI-B-galactosidase- 
SMCC with tatl-72. According to SDS-PAGE analysis, the 
PKAI-B-galactosidase:tatl-72 ratio appeared to be 
1:0.5. We have produced about 100 ^ig of the final 
product. Because of precipitation problems, the 
concentration of the final product in solution has been 
limited to lOO ^g/ml. 

EXAMPLE q 

E2 Repressor Conjugates 

To test cellular uptake and E2 repressor 
15 activity of transport polypeptide-E2 repressor 
conjugates, we simultaneously transfected an E2- 
dependent reporter plasmid and an E2 expression plasmid 
into SV4 0-transf ormed African green monkey kidney 
("COS?") cells. Then we exposed the transfected cells 
2 0 to transport polypeptide-E2 repressor conjugates (made 
by genetic fusion or chemical cross-linking) or to 
appropriate controls. The repression assay, described 
below, was essentially as described in Barsoum et al. 
{ supra ^ . 

25 Repression Assay Cells 

We obtained the COS? cells from the American 
Type Culture Collection, Rockville, MD (ATCC No. CRL 
1651). We propagated the COS? cells in Dulbecco's 
modified Eagle's medium (GIBCO, Grand Island, NY) withi 
10% fetal bovine serum (JRH Biosciences, Lenexa, KS) 
and 4 mM glutamine ("growth medium"). cell incubation 
conditions were 5.5% COj at 3?''C. 
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Repression Assay Plasmids 

Our E2-dependent reporter plasmid, pXB332hGH, 
contained a human growth hormone reporter gene driven 
by a truncated SV4 0 early promoter having 3 upstream E2 
5 binding sites. We constructed the hGH reporter 
plasmid, pXB332hGH, as described in Barsoum et al- 
( supra ^ . 

For expression of a full-length HPV E2 gene, 
we constructed plasmid pAHE2 (Figure 4) . Plasmid pAHE2 

10 contains the E2 gene from HPV strain 16, operatively 

linked to the adenovirus major late promoter augmented 
by the SV4 0 enhancer, upstream of the promoter. We 
isolated the HPV E2. gene from plasmid pHPVie (the full- 
length HPV16 genome cloned into pBR322) , described in 

15 M. Durst et al., "A Papillomavirus DNA from Cervical 
Carcinoma and Its Prevalence in Cancer Biopsy Samples 
from Different Geographic Regions", Proc, Natl, Acad, 
Sci, USA . 80, pp. 3812-15 (1983), as a Tthllll-Asel 
fragment, Tthllll cleaves at nucleotide 2711, and Asel 

20 cleaves at nucleotide 3929 in the HPV16 genome. We 

blunted the ends of the Tthllll-Asel fragment in a DNA 
polymerase I Klenow reaction, and ligated BamHI linkers 
(New England Biolabs, cat. no. 1021) , We inserted this 
linker-bearing fragment into BamHI-cleaved plasmid 

25 pBG331, to create plasmid pAHE2 . 

Plasmid pBG331 is the same as pBG312 (R.L, 
Cate et al., "Isolation of the Bovine and Human Genes 
for Mullerian . Inhibiting Substance and Expression of 
the Human Gene in Animal Cells", Cell , 45, pp. 685-98 

30 (1986)) except that it lacks the BamHI site downstream 
of the SV4 0 polyadenylation signal, making the BamHI 
site between the promoter and the SV4 0 intron unique. 
We removed the unwanted BamHI site by partial BamHI 
digestion of pBG312, gel purification of the linearized 

35 plasmid, blunt end formation by DNA polymerase I Klenow 
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treatment, self -ligation and screening for plasmids 
with the desired deletion of the BaiiiHI site. 

Bacterial Prod uction of E2 Repressor Proteins 

One of our E2 repressor proteins, E2.123, 
consisted of the carboxy-terminal 121 amino acids of 
HPV16 E2 with MetVal added at the amino terminus. We 
also used a variant of E2.123, called E2.12 3CCSS. 
E2.123 has cysteine residues at HPV16 E2 amino acid 
positions 251, 281, 300 and 309. In E2.123CCSS, the 
cysteine residues at positions 300 and 309 were changed 
to serine, and the lysine residue at position 299 'was 
changed to arginine. We replaced the cysteine residues 
at positions 300 and 309, so that cysteine-dependent 
chemical cross-linking could take place in the amino 
terminal portion of the E2 repressor, but not in the E2 
minimal DNA binding/dimerization domain. We considered 
crosslinks in the minimal DNA binding domain likely to 
interfere with the repressor's biological activity. 

For construction of plasmid pET8c-12 3 
(Figure 5; SEQ ID NO:l4), we produced the necessary DNA 
fragment by standard polymerase chain reaction ("PGR") 
techniques, with plasmid pHPVlS as the template. (For 
a general discussion of PGR techniques, see Ghapter 14 
of Sambrook et al., supra. Automated PGR equipment and 
25 chemicals are commercially available.) The nucleotide 
sequence of EA52, the PGR oligonucleotide primer for 
the 5- end of the 374 base pair E2-123 fragment, is set 
forth in the Sequence Listing under SEQ ID NO: 15. The 
nucleotide sequence of EA54 , the PGR oligonucleotide 
primer used for the 3' end of the E2-123 fragment is 
set forth in the Sequence Listing under SEQ ID NO: 16. 
We digested the PGR products with Ncol and BamHI and 
cloned the resulting fragment into NcoI/BamHI-digested 
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expression plasiaid pETSc (Studier et al., supra ) , to 
create plasmid pET8c-123. 

By using the same procedure with a different 
5* oligonucleotide PCR primer, we obtained a 260 base 
5 pair fragment ("E2-85") containing a methionine codon 
and an alanine codon immediately followed by codons for 
the carboxy-terminal 83 amino acids of HPV16 E2 . The 
nucleotide sequence of EA57 , the PCR 5' primer for 
producing E2-85; is set forth in the Sequence Listing 

10 under SEQ ID NO:34. 

To construct plasmid pET8c-l2 3CCSS (Figure 6; 
SEQ ID NO:17)/ for bacterial production of E2,123C5CSS, 
we synthesized an 882 bp Pstl-EagI DNA fragment by PCR 
techniques. The PCR template was pET8C"123. One of 

15 the PCR primers, called 374.140, encoded all three 
amino acid changes: 

CGACACTGCA- GTATACAATG TAGAATGCTT TTTAAATCTA TATCTTAAAG 
ATCTTAAAG (SEQ ID NO:18). The other PCR primer, 
374.18, had the following sequence: GCGTCGGCCG 

2 0 CCATGCCGGC GATAAT (SEQ ID NO: 19). We digested the PCR 
reaction products with PstI plus EagI and isolated the 
882 bp fragment by standard methods. The final step 
was production of pET8c-123CCSS in a 3-piece ligation 
•joining a 3424 bp EcoRI-EagI fragment from pET8c-123 

25 with the 882. bp PCR fragment and a 674-bp Pstl-EcoRI 

pET8c-123 fragment, as shown in Figure 6. We verified 
the construction by DNA sequence analysis. For 
production of E2.123 and E2.123CCSS proteins, we 
expressed plasmids pET8c-123 and pET8c-12 3CCSS in 

30 E. coli strain BL21 (DE3 ) pLysS , as described by Studier 
( supra ) . 



Purification of E2 Repressor Proteins 

We thawed 3.6 grams of frozen, pET8c-12 3- 
transformed £ . coli cells and suspended them in 3 5 ml of 
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25 niM Tris-HCl (pH 1 .S) , 0.5 mM EDTA, 2.5 mM DTT, plus 
protease inhibitors (1 mM PMSF, 3 mM benzamidine , 
50 Mg/ml pepstatin A, 10 Mg/ml aprotinin) . We lysed 
the cells by two passages through a French press at 
5 10,000 psi. We centrifuged the lysate at 12,000 rpm, 
in an SA600 rotor, for 1 hour. The E2.12 3 protein was 
in the supernatant. To the supernatant, we added MES 
buffer (pH 6) up to 25 mM, MES buffer (pH 5) up to 
10 inM, and NaCl up to 125 mM. We then applied the 

10 supernatant to a 2 ml S Sepharose Fast Flow column at 
6 ml/hr. After loading, we washed the- column with 
50 mM Tris-HCl (pH 7.5), 1 mM DTT. We then carried out 
step gradient elution (2 ml/step) with 200, 300, 400, 
500, 700 and 1000 mM NaCl in 50 mM Tris-HCl (pH -7.5), 

15 1 mM DTT. The E2.123 repressor protein eluted in the 
500 and 700 mM NaCl fractions. SDS-PAGE analysis 
indicated the E2.123 repressor purity exceeded 95%. 

We thawed 3.0 grams of frozen, pET8c-123CCSS- 
transformed E. coli and suspended the cells in 3 0 ml of 

20 the same buffer used f or pET8c-123-transf ormed cells 
(above) . Lysis, removal of insoluble cellular debris 
and addition of MES buffer and NaCl was also as 
described for purification of E2-123. The purification 
procedure for E2.123CCSS diverged after addition of the 

25 MES buffer and NaCl, because a precipitate formed, with 
E2.123CCSS, at that point in the procedure. We removed 
the precipitate by centrif ugation , and found that it 
and the supernatant both contained substantial E2 
repressor activity. Therefore, we subjected both to 

30 purification steps. We applied the supernatant to a 
2 ml S Sepharose Fast Flow column (Pharmacia LKB, 
Piscataway, NJ) at 6 ml/hr. After loading, we washed 
the column with 50 mM Tris-HCl (pH 7.5), l mM DTT. 
After washing the column, we carried out step gradient 

35 elution (2 ml/step), using 300, 400, 500, 700 and 1000 
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inM NaCl in 50 inM Tris-HCl (pH 7.5), i mM DTT. The 
E2.123CCSS protein eluted with 700 laM NaCl. SDS-PAGE 
analysis indicated its purity to exceed 95%. We 
dissolved the E2.123CCSS precipitate in 7.5 ml of 25 luM 
5 Tris-HCl (pH 7.5), 125 mM NaCl, 1 mM DTT and 0.5 mM 
EDTA. We loaded the dissolved material onto a 2 ml 
S Sepharose Fast Flow column and washed the column as 
described for E2.123 and non-precipitated E2.12 3CCSS. 
We carried out step gradient elution (2 ml/step), using 

10 300, 500, 700 and 1000 mM NaCl. The E2 repressor 
eluted in the 500-700 mM NaCl fractions. SDS-PAGE 
analysis indicated its purity to exceed 98%. 
Immediately following purification of the E2.123 and 
E2.123CCSS proteins, we added glycerol to a final 

15 concentration of 15% (v/v) , and stored flash-frozen 
(liquid N2) aliquots at -70*'C. We quantified the 
purified E2 repressor proteins by UV absorbance at 
280 nm, using an extinct ion coefficient of 1.8 at 
1 mg/ml . 

2 0 Chemical Cross-Linking 

We performed chemical synthesis of the 
transport polypeptide consisting of tat amino acids 37- 
72, as described in Example 1. We dissolved the 
polypeptide (5 mg/ml) in 10 mM MES buffer (pH 5.0), 

25 50 mM NaCl, 0.5 mM EDTA, (extinction coefficient of 0.2 
at 1 ml/ml) . To the transport polypeptide solution, we 
added a bismaleimidohexane ("BMH") (Pierce Chemical 
Co., Rockford, IL, cat. no. 22319G) stock solution 
(6.25 mg/ml DMF) to a final concentration of 1.25 

30 mg/ml, and a pH 7.5 HEPES buffer stock solution (1 M) 

to a final concentration of 100 mM. We allowed the BMH 
to react with the protein for 3 0 minutes at room 
temperature. * We then separated the protein-BMH from 
unreacted BMH by gel filtration on a G-10 column 
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equilibrated in 10 mM MES (pH. 5) , 50 luM NaCl, 0 . 5 inM 
EDTA. We stored aliguots of the transport polypeptide- 
BMH conjugate at -70°C. 

For cross-linking of the transport 
5 polypeptide-BMH conjugate to the E2 repressor, we 
removed the E2 repressor protein from its storage 
buffer. We diluted the E2 repressor protein with three 
volumes of 25 mM MES (pH 6.0), 0.5 mM EDTA and batch- 
loaded it onto S Sepharose Fast Flow (Pharmacia LKB , 

10 Piscataway, NJ) at 5 mg protein per ml resin. After 
pouring the slurry of protein-loaded resin into a 
column, we washed the column with 25 mM MES (pH 6.0), 
0.5 mM EDTA, 250 mM NaCl. We then eluted the bound E2 
repressor protein from the column with the same buffer 

15 containing 800 mM NaCl . We diluted the E2 repressor- 
containing eluate to 1 mg/ml with 25 mM MES (pH 6.0), 
0.5 mM EDTA. From trial cross-linking studies 
performed with each batch of E2 repressor protein and 
BMH-activated transport polypeptide, we determined that 

20 treating 1 mg of E2 repressor protein with 0.6 mg of 

BMH-activated transport polypeptide yields the desired 
incorporation of 1 transport molecule per E2 repressor 
homodimer- Typically, we mixed 2 ml of E2 repressor (1 
mg/ml) with 300 ^1 of tat37-72-BMH (4 mg/ml) and 200 fil 

25 of 1 M HEPES (pH 7.5). We allowed the cross-linking 

reaction to proceed for 3 0 minutes at room temperature. 
We terminated the cross-linking reaction by adding 2- 
mercaptoethanol to a final concentration of 14 mM. We 
determined the extent of cross-linking by SDS-PAGE 

30 analysis. We stored aliguots of the tat37-72-E2 

repressor conjugate at -70°C, We employed identical 
procedures to chemically cross-link the tat37-72 
transport polypeptide to the HPVE2 12 3 repressor 
protein and the HPVE2 CCSS repressor protein. 
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Cellular Uptake of E2 Repressor Coniuaates 

For our E2 repression assays, we used 
transient expression of plasmids transfected into COS7 
cells. Our E2 repression assay procedure was similar 
5 to that described in Barsoum et al, ( supra ) . V?e 

transfected 4 x 10^ C0S7 cells (about 50% confluent at 
the time of harvest) by electroporation, in two 
separate transf ections ("EPl" and "EP2") . In 
transfection EPl, we used 20 pXB332hGH (reporter 

10 plasmid) plus 380 sonicated salmon sperm carrier DNA 
(Pharmacia LKB, Piscataway, NJ) . In transfection EP2 , 
we used 20 /xg pXB332hGH plus 30 jug pAHE2 (E2 
transactivator) and 3 50 /xg salmon sperm carrier DNA. 
We carried out electroporations with a 3io-Rad Gene 

15 Pulser, at 270 volts, 960 mFD, with a pulse time of 
about 11 msec. Following the electroporations, we 
seeded the cells in 6-well dishes, at 2 x 10^ cells per 
well. Five hours after the electroporations, we 
aspirated the growth medium, rinsed the cells with 

2 0 growth medium and added 1.5 ml of fresh growth medium 
to each well. At this time, we added chloroquine 
("CQ") to a final concentration of 80 mM (or a blank 
solution to controls) . Then we added tat37-72 cross- 
linked E2.123 ("TxHE2") or tat37-72 cross-linked to 

25 E2.123CCSS ( "TxHE2CCSS" ) . The final concentration of 
these transport polypeptide-cargo conjugates was 6, 20 
or 60 A^g/ml of cell growth medium (Table I) . 
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TABLE I 



well 


Identification of 


Samples 


CO (uM) 


protein (ua/ 


EPl. 1 


0 


0 


EP1.2 


80 


0 


EP2.1 


0 


0 


EP2.2 


0 


6 TxHE2 


EP2,3 


0 


2 0 TXHE2 


EP2.4 


0 


6 0 TXHE2 


EP2.5 


0 


6 TXHE2CCSS 


EP2,6 


0 


20 TXHE2CCSS 


EP2.7 


0 


60 TXHE2CCSS 


EP2.8 


80 


0 


EP2 . 9 


80 


6 TXHE2 


EP2 . 10 


80 


2 0 TxHE2 


EP2.il 


80 


6 0 TXHE2 


EP2 . 12 


80 


6 TXHE2CCSS 


EP2 . 13 


80 


2 0 TXHE2CCSS 


EP2 . 14 


80 


60 TXHE2CCSS 



2 0 After an 18-hour incubation, we removed the 

medium, rinsed the cells with fresh medium, and added 
1.5 ml of fresh medium containing the same 
concentrations of chloroquine and transport 
polypeptide-cargo conjugates as in the preceding 18- 
25 hour incubation. This medium change was to remove any 
hGH that may have been present before the repressor 
entered the cells. Twenty-four hours after the medium 
change, we harvested the cells and performed cell 
counts to check for viability. V?e then assayed for hGH 

3 0 on undiluted samples of growth medium according to the 

method of Seldon, described in Protocols in Molecular 
Biology . Green Publishing Associates, New York, pp. 
9.7. 1-9*7. 2 (1987), using the Allegro Human Growth 
Hormone transient gena expression system kit (Nichols 
35 Institute, San Juan Capisrrano, CA) . We subtracted the 
assay background (i.e., assay components with non- 
conditioned medium added) from the hGH cpm, for all 
samples. We performed separate percentage repression 
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calculations for a given protein treatment, according 
to vhether chloroquine was present ("(+)CQ") or absent 
("(-)CQ") in the protein uptake test. We calculated 
percentage repression according to the following 
5 formula: 

Repression = (ACT - BKG) - (REP - BKG) x 100 

ACT - BKG 

where: BKG = hGH cpm in the transf ections of 
reporter alone (e.g., EPl.l for (-)CQ 
0 and EP1.2 for (+)CQ); 

ACT = hGH cpm in the transfection of 
reporter plus transactivator , but to 
which no repressor conjugate was added 
(e.g., EP2.1 for (-)CQ and EP2 . 8 for 
5 (+)CQ); 

REP = hGH cpm in the transfection of 
reporter plus transactivator, to which a 
repressor conjugate was added (e.g., 
EP2.2-2.7 for (-)CQ and EP2.9-2.14 for 
0 (+)CQ). 

Data from a representative E2 repression assay are 
shown in Table II. Table I identifies the various 
samples represented in Table II. Figure 7 graphically 
depicts the results presented in Table II. 
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TABLE II 
E2 Repression Assay 
cpm - 

sample hGH com assay bkad cpm - BKG % repression 



5 


EPl. 


1 


3958 


3808 


— 


«. 






EPl. 


2 


5401 


5251 


— — 









EP2 . 


1 


15 f 161 


15 Oil 


X J. , z u ^ 








EP2 • 


2 


12,821 


12, 671 


8863 


20. 


9 




EP2 , 


3 


10,268 


10 , 118 


6310 


43 . 


7 


10 


EP2 . 


4 


8496 


8346 


4538 


59. 


5 




EP2 , 


5 


11, 934 


11, 784 


7976 


28 . 


8 




EP2 . 


6 


9240 


9090 


5282 


52 . 


9 




EP2 , 


7 


7926 


7776 


3968 


64 . 


6 




EP2 . 


8 


15, 120 


14 , 970 


9719 






15 


EP2 . 


9 


12,729 


12,579 


7328 


24 . 


6 




EP2. 


10 


9590 


9440 


4189 


56 . 


9 




EP2 . 


11 


8440 


8290 


3039 


68 . 


7 




EP2 . 


12 


11, 845 


11, 695 


6444 


33 . 


7 




EP2 . 


13 


8175 


8025 


2774 


71 . 


5 


20 


EP2 . 


14 


6697 


6547 


1296 


86. 


7 



Transport polypeptide tat37-72 cross-linked 
to either E2 repressor (£2.123 or E2,123CCSS) resulted 
in a dose-dependent inhibition of E2-dependent gene 
expression in the cultured mammalian cells (Table II; 

25 Figure 7). We have repeated this experiment four 
times, with similar results. The effect was E2- 
specific, in that other tat37-72 conjugates had no 
effect on E2 induction of pXB332hGH (data not shown) . 
Also, the tat37-72xHE2 conjugates had no effect on the 

3 0 hGH expression level of a reporter in which the 

expression of the hGH gene was driven by a constitutive 
promoter which did not respond to E2 . The E2 repressor 
with the cess mutation repressed to a greater degree 
than the repressor with the wild-type amino acid 

35 sequence. This was as expected, because cross-linking 
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of the transport polypeptide to either of the last two 
cysteines in the wild-type repressor would likely 
reduce or eliminate repressor activity. Chloroquine 
was not required for the repression activity. However, 
5 chloroquine did enhance repression in all of the tests. 
These results are summarized in Table II and Figure 7. 

EXAMPLE 10 
TATACY5 Conjugates 
Production of TatAcys 

iO For bacterial production of a transport 

polypeptide consisting of tat amino acids 1-21 fused 
directly to tat amino acids 38-72, we constructed 
expression plasmid pTATAcys (Figure 8; SEQ ID N0:20). 
To construct plasmid pTATAcys, we used conventional PGR 

15 techniques, with plasmid pTAT72 as the PGR template. 

One of the oligonucleotide primers used for the PGR was 
374,18 (SEQ ID N0:19), which covers the EagI site 
upstream of the tat coding sequence. (We also used 
oligonucleotide 374.18 in the construction of plasmid 

20 pET8c-123CGSS. See Example 9,) The other 

oligonucleotide primer for the PGR, 374.28, covers the 
EagI site within the tat coding sequence and has a 
deletion of the tat DNA sequence encoding amino acids 
22-37. The nucleotide sequence of 374.28 is: 

2 5 TTTACGGGGG TAAGAGATAG GTAGGGGTTT GGTGATGAAC GGGGT (SEQ 
ID NO: 21). We digested the PGR products with EagI and 
isolated the resulting 762-base pair fragment. We 
inserted that EagI fragment into the 4057 base pair 
vector produced by EagI cleavage of pTAT72. We 

30 verified the construction by DNA sequence analysis and 
expressed the tatAcys polypeptide by the method of 
Studier et al. f supra ) . SDS-PAGE analysis showed the 
tatAcys polypeptide to have the correct size. 
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For purification of tatAcys protein, we 
thawed 4.5 grains of pTATAcys-transf ormed E, coli cells, 
resuspended the cells in 3 5 ml of 2 0 iiiM MES (pH 6.2), 
0.5 mM EDTA. We lysed the cells by two passes through 
^ ^ French press, at 10,000 psi. We removed insoluble 
debris by centr if ugation at 10,000 rpm in an SA600 
rotor, for 1 hour. We applied the supernatant to a 5 
ml S Sepharose Fast Flow column at 15 ml/hr. We washed 
the column with 50 mM Tris-HCl (pH 7.5), 0.3 mM DTT. 
10 We then carried out step gradient elution (2 ml/step) 
with the same buffer containing 300, 400, 500, 700 and 
950 mM NaCl. The tatAcys protein eluted in the 950 mM 
NaCl fraction. 

We conjugated a tatAcys transport polypeptide 
15 to rhodamine isothiocyanate and tested it by assaying 
directly for cellular uptake. The results-were 
positive (similar to results in related experiments 
with tatl-72) . 



20 



TATAcvs-249 Genetic Fusion 

For bacterial expression of the tatAcys 
transport polypeptide genetically fused to the amino 
terminus of the native E2 repressor protein (i.e., the 
carboxy-terminal 24 9 amino acids of BPV-l E2) , we 
constructed plasmid pTATAcys-24 9 as follows. We 
25 constructed plasmid pFTE501 (Figure 9) from plasmids 
pTAT72 (Frankel and Pabo, supra 1 and pXB314 (Barsoum 
et al., supra ) . From plasmid pXB314, we isolated the 
Ncol-Spel DNA fragment encoding the 249 amino acid BPV- 
1 E2 repressor. (Ncol cleaves at nucleotide 296, and 
30 Spel cleaves at nucleotide 1118 of pXB314.) We blunted 
the ends of this fragment by DNA polymerase I Klenow 
treatment and added a commercially available Bglll 
linker (New England Biolabs, cat, no. 1090) . We 
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inserted this linker-bearing fragment into BamHI- 
cleaved (complete digestion) plasmid pTAT72. In 
pTAT72, there is a BamHI cleavage site within the tat 
coding region, near its 3' end, and a second BamHI 
5 cleavage site slightly downstream of the tat gene. The 
Bglll linker joined the tat and E2 coding sequences in 
frame to encode a fusion of the first 62 amino acids of 
tat protein followed by a serine residue and the last 
249 amino acids of BPV-1 E2 protein. We designated 

10 this bacterial expression plasmid pFTESOl (Figure 9). 
To construct plasmid pTATAcys-24 9 (Figure 10; SEQ ID 
NO:22), we inserted the 762 base pair EagI fragment 
from plasmid pTAT cys, which includes the portion of 
tat containing the cysteine deletion, into the 4812 

15 base pair EagI fragment of plasmid pFTESOl. 

Purification of tatAcvs-249 

We thawed 5 g of £, coll expressing tatAcys- 
249 and suspended the cells in 40 ml of 25 mM Tris HCl 
(pH 7.5), 25 mM NaCl, 0.5 mM EDTA, 5 mM DTT, plus 

20 protease inhibitors (1.25 mM PMSF, 3 mM Benzamidine, . 

50 Mg/ml pepstatin A, 50 fig/ml aprotinin, 4 fxg/ml E64). 
We lysed the cells by two passages through a French 
pressure cell at 10,000 psi. We removed insoluble 
debris from the lysate by centrif ugation at 12,000 rpm 

25 in an SA600 rotor, for l hour. We purified the 

tatAcys-249 from the soluble fraction. The supernatant 
was loaded onto a 2 ml S Sepharose Fast Flow column 
(Pharmacia LKB, Piscataway, NJ) at a flow rate of 
6 ml/h. The column was washed with 25 mM Tris HCl pH 

30 (7.5), 25 mM NaCl, 0.5 mM EDTA, 1 mM DTT and treated 
with sequential salt steps in the same buffer 
-containing 100, 200, 300, 400, 500, 600, and 800 mM 
NaCl. We recovered the TatAcys-249 in the 600-800 mM 
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salt fractions. We pooled the peak fractions, added 
glycerol to X5%, and stored aliquots at -70«C* 

iTTiTnunof luorescence Assay 

To analyze cellular uptake of the tatAcys-E2 
5 repressor fusion protein, we used indirect 

immunofluorescence techniques. We seeded HeLa cells 
onto cover slips in 6-well tissue culture dishes, to 
50% confluence. After an overnight incubation, we 
added the tatAcys-E2 repressor fusion protein (1 ^g/ml 

10 final concentration) and chloroquine (0.1 mM final 

concentration) . After six hours, we removed the fusion 
protein/ chloroquine-containing growth medium and washed 
the cells twice with PBS. We fixed the washed cells in 
3,5% f oirmaldehyde at room temperature. We 

15 permeabilized the fixed cells with 0,2% Triton X-100/2% 
bovine serum albumin ("BSA") in PBS containing 1 mM 
MgCl2/0.1 mM CaClj ("PBS4-") for 5 minutes at room 
temperature. To block the permeabilized cells, we 
treated them with PBS containing 2% BSA, for 1 hour at 

20 4<»C. 

We incubated the cover slips with 20 /il of a 
primary antibody solution in each well, at a 1:100 
dilution in PBS+ containing 2% BSA, for 1 hour at 4°C. 
The primary antibody was either a rabbit polyclonal 

25 antibody to the BPV-1 E2 repressor (generated by 

injecting the purified carboxy-terminal 85 amino acids 
of E2) , or a rabbit polyclonal antibody to tat 
(generated by injecting the purified amino-terminal 72 
amino acids of tat protein) . We added a secondary 

30 antibody at a 1:100 dilution in 0.2% Tween-20/2% BSA in 
PBS+ for 30 minutes at 4°C. 

The secondary antibody was a rhodamine- 
conjugated goat anti-rabbit IgG (Cappel no. 2212-0081). 
Following incubation of the cells with the secondary 
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antibody, we washed the cells with 0.2% Tween 20/2%. BSA 
in PBS+ and mounted the cover slips in 90% glycerol, 25 
mM sodium phosphate (pH 1.2), 150 itiM NaCl. We examined 
the cells with a fluorescent microscope having a 
5 rhodamine filter. 

Cellular Utptake of TatACvs Fusionc; 

V7e observed significant cellular uptake of 
the tatAcys-E2 repressor fusion protein, using either 
the tat antibody or the E2 antibody. in control cells 

10 exposed to the unconjugated tat protein,- we observed 

intracellular fluorescence using the tat antibody, but 
not the E2 antibody. In control cells exposed to a 
mixture of the unconjugated E2 repressor and tat 
protein or tatAcys, we observed fluorescence using the 

15 tat antibody, but not the E2 antibody. This verified 
that tat mediates E2 repressor uptake only when linked 
to the tat protein. As with unconjugated tat protein, 
we observed the tatAcys-E2 repressor fusion protein 
throughout the cells, but it was concentrated in 

20 intracellular vesicles. These results show that a tat- 
derived polypeptide completely lacking cysteine 
residues can carry a heterologous protein (i.e., 
transport polypeptide-cargo protein genetic fusion) 
into animal cells. 

25 In a procedure similar to that described 

above, we produced a genetic fusion of tatAcys to the 
C-terminal 12 3 amino acids of HPV E2 . When added to 
the growth medium, this fusion polypeptide exhibited 
repression of E2-dependent gene expression in C0S7 

30 cells (data not shown) . 
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EXAMPLE 11 

Antisense Oliaodeoxvnucleotide Conjugates 

Using an automated DNA/RNA synthesizer 
(Applied Biosystems model 3 94) , we synthesized DNA 
5 phosphorothionate analogs (4-18 nucleotides in length) , 
with each containing a free amino group at the 5 ' end . 
The amine group was incorporated into the 
oligonucleotides using commercially modified 
nucleotides (aminolink 2, Applied Biosystems). The 

10 oligonucleotides corresponded to sense and antisense 
strands from regions of human growth hormone and CAT 
messenger RNA, 

For each cross-linking reaction, we dissolved 
200 /ig of an oligonucleotide in 100 ix\ of 25 mM sodium 

15 phosphate buffer (pH 7.0). We then added 10 /il of a 50 
mM stock solution of sulfo-SMCC and allowed" the 
reaction to proceed at room temperature for 1 hour. We 
removed unreacted sulfo-SMCC by gel filtration of the 
reaction mixture on a P6DG column (Bio-Rad) in 25 mM 

20 HEPES (pH 6,0). We dried the oligonucleotide-sulf o- 
SMCC adduct under a vacuum. Recovery of the 
oligonucleotides in this procedure ranged from 58 to 
95%. For reaction with a transport polypeptide, we 
redissolved each oligonucleotide-sulf o-SMCC adduct in 

25 50 Ml of 0.5 mM EDTA, transferred the solution to a 
test tube containing 50 ^tg of lyophilized transport 
polypeptide, and allowed the reaction to proceed at 
room temperature for 2 hours. We analyzed the reaction 
products by SDS-PAGE. 
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EXAMPLE 12 
Antibody Coniuaa-tes 

Anti-Tubulin Conjugate 1 

We obtained commercial mouse IgGl mAb anti- 
5 tubulin (Amersham) and purified it from ascites by 

conventional methods, using protein A. We labelled the 
purified antibody with rhodamine isothiocyanate, at 1.2 
moles rhodamine/mole Ab. When we exposed fixed, 
permeabilized HeLa cells to the labelled antibody, 
10 microscopic examination revealed brightly stained 

microtubules. Although the rhodamine labelling was 
sufficient, we enhanced the antibody signal with anti- 
mouse FITC. 

In a procedure essentially as described in 

15 Example 2, (above) we allowed 250 /xg of the antibody to 
react with a 10:1 molar excess of sulfo-SMCC. We then 
added 48 Mg of (^^S-labelled) tatl-72 . The molar ratio 
of tatl-71:Ab was 2.7: 1, According to incorporation of 
radioactivity, the tatl:72 was cross-linked to the 

2 0 antibody in a ratio of 0.6:1. . 

For analysis of uptake of the tatl-72-Ab 
conjugate, we added the conjugate to medium (10 /xg/ml) 
bathing cells grown on coverslips. We observed a 
punctate pattern of fluorescence in the cell. The 

25 punctate pattern indicated vesicular location of the 
conjugate, and was therefore inconclusive as to 
cytoplasmic delivery. 

To demonstrate immunoreactivity of the 
conjugated antibody, we tested its ability to bind 

30 tubulin. We coupled purified tubulin to cyanogen 

bromide-activated Sepharose 4B (Sigma Chem. Co., St. 
Louis, MO) . We applied a samples of the radioactive 
conjugate to the tubulin column (and to a Sepharose 4B 
control column) and measured the amount of bound 



wo 94/04686 



PCT/US93/07833 



- 61 - 



conjugate. More radioactivity bound to the affinity 
matrix than to the control column, indicating tubulin 
binding activity. 

Anti-Tubulin Conjugate 2 

5 In a separate cross-linking experiment, we 

obtained an anti-tubulin rat monoclonal antibody lgG2a 
(Serotec) , and purified it from ascites by conventional 
procedures, using protein G. We eluted the antibody 
with Caps buffer (pH 10) . The purified antibody was 
10 positive in a tubul in-binding assay. We allowed tatl- 
72 to react with rhodamine isothiocyanate at a molar 
ratio of i:i. The reaction product exhibited an 
^555/^280 J^atio of 0.63, which indicated a substitution, 
of approximately 0.75 mole of dye per mole of tatl-72. 
15 Upon separation of the unreacted dye from the tatl-72- 
rhodamine, by G-25 gel filtration (Pharmacia LKB, 
Piscataway, NJ) , we recovered only 52 ^9 out of 150 /ng 
of tatl-72 used in the reaction. 

We saved an aliquot of the tatl-7 2 -rhodamine 
20 for use (as a control) in cellular uptake experiments, 
and added the rest to 0.4 mg of antibody that had 
reacted with SMCC (20:1). The reaction mixture 
contained a .tatl-72 :Ab ratio of approximately i:i, 
rather than the intended 5:1. (In a subsequent 
25 experiment, the 5:1 ratio turned out to be 

unsatisfactory, yielding a precipitate.) We allowed 
the cross-linking reaction to proceed overnight at A^C. 
We then added a molar excess of cysteine to block the 
remaining maleimide groups and thus stop the cross- 
linking reaction. We centrifuged the reaction mixtures 
to remove any precipitate present. 

We carried out electrophoresis using a 4-2 0% 
polyacrylamide gradient gel to analyze the supernatant 
under reducing and non-reducing conditions. We also 



30 
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analyzed the pellets by this procedure. In 
supernatants from antibody-tatl-72 (without rhodamine) 
conjugation experiments, we observed very little 
nvaterial on the 4-20% gel. However, in supernatants 
5 from the antibody-tatl-72-rhodamine conjugation 

experiments, we observed relatively heavy bands above 
the antibody, for the reduced sample. The antibody 
appeared to be conjugated to the tatl-72 in a ratio of 
approximately 1:1. 

10 In cellular uptake experiments carried out 

with conjugate 2 (procedure as described above for 
conjugate 1) , we obtained results similar to those 
obtained with conjugate 1. When visualizing the 
conjugate by rhodamine fluorescence or by fluorescein 

15 associated with a second antibody, we observed the 
conjugate in vesicles. 

EXAMPLE 13 
Additional Tat-E2 Conjugates 



Chemically Cross-Linked Tat-E2 Conjugates 
20 We chemically cross-linked transport 

polypeptide tat37-72 to four different repressor forms 
of E2. The four E2 repressor moieties used in these 
experiments were the carboxy-terminal 10 3 residues 
(i.e., 308-410) of BPV-l ("E2.103"); the carboxy- 
25 terminal 249 residues (i.e., 162-410) of BPV-l 

("E2.249"); the carboxy-terminal 121 residues (i.e., 
245-365) of HPV-16 ("HE2"); and the carboxy-terminal 
121 residues of HPV-16, in which the cysteine residues 
at' positions 300 and 309 were changed to serine, and 
30 the lysine residue at posirion 299 was changed to 

arginine ("HE2CCSS") . The recombinant production and 
purification of HE2 and HE2CCSS, followed by chemical 
cross-linking of HE2 and HE2CCSS to tat37-72 , to form 
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TxHE2 and TxHE2CCSS, repectively, are described in 
Example 9 (above) - For the chemical cross-linking of 
E2.103 and E2.249 to tat37-72 (to yield the conjugates 
designated TxE2,103 and TxE2.249), we employed the same 
5 method used to make TxHE2 and TxHE2CCSS (Example 9, 
supra ) . 

We expressed the protein E2.103 in E, coli 
from plasmid pET-E2.103. V?e obtained pET-E2.103 by a 
PGR cloning procedure analogous to that used to produce 

10 pET8c-123, described in Example 9 (above) and Figure 5. 
As in the construction of pET8c-123, we ligated a PGR- 
produced NcoI-BamHI E2 fragment into NcoI-BamHI-cleaved 
pETSc. Our PGR template for the E2 fragment was 
plasmid pGO"E2 (Hawley-Nelson et al. , EMBO J. . vol 1, 

15 pp. 525-31 (1988); United States patent 5,219,990). 
The oligonucleotide primers used to produce the E2 
fragment from pGO-E2 were EA21 (SEQ ID NO: 36) and EA2 2 
(SEQ ID NO:37). Primer EA21 introduced an Ncol site 
that added a methionine codon followed by an alanine 

20 codon 5* adjacent to the coding region for the carboxy- 
terminal 101 residues of BPV-l E2 . 

We expressed the protein E2.24 9 in E. coli 
from plasmid pET8c-249. We constructed pET8c-249 by 
inserting the 1362 bp NcoI-BamHI fragment of plasmid 

25 pXB314 (Figure 9) into NcoI-BamHI-cleaved pETSc (Figure 
5) . 

TATAcvs-BPV E2 Genetic Fusions 

In addition to TATAcys-249, we tested several 
other TATAcys-BPV-1 E2 repressor fusions. Plasmid 
30 pTATAcys-105 encoded tat residues 1-21 and 38-67, 

followed by the carboxy-terminal 105 residues of BPV-l. 
Plasmid pTATAcys-161 encoded tat residues 1-21 and 
38-62, followed by the carboxy-terminal 161 residues of 
BPV-l. We constructed plamids pTATAcys-105 and 
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pTATAcys-161 from intermediate plasmids pFTE103 and 
PFTE4 03, respectively. 

We produced pFTE103 and pFTE403 (as well as 
pFTESOl) by ligating different inserts into 
BamHI-cleaved (complete digestion) vector pTAT72. 

To obtain the insertion fragment for pFTE103, 
we isolated a 929 base pair Plel-BamHI fragment from 
pXB314 and ligated it to a double-stranded linker 
consisting of synthetic oligonucleotide FTE.3 (SEQ ID 
NO: 23) and synthetic oligonucleotide FTE.4 (SEQ ID 
NO:24). The linker encoded tat residues. 61-67 and had 
a BamHI overhang at the 5' end and a Plel overhang at 
the 3' end. We ligated the linker-bearing fragment 
from PXB3314 into BamHI-cleaved pTAT72, to obtain 
15 PFTE103. To obtain the insertion fragment for pFTE403, 
we digested pXB314 with Ncol and Spel, generated blunt 
ends with Klenow treatment and ligated a Bglll linker 
consisting of GAAGATCTTC (New England Biolabs, Beverly, 
MA, Cat. No. 1090) (SEQ ID NO:35) duplexed with itself. 
20 We purified the resulting 822-base pair fragment by 

eletrophoresis and then ligated it into BamHI-digested 
pTAT7 2 vector, to obtain pFTE4 03. 

To delete tat residues 22-37, thereby 
obtaining plasmid pTATAcys-105 from pFTE103 and 
25 pTATAcys-161 from pFTE403, we employed the same method 
(described above) used to obtain plasmid pTATAcys-249 
from pFTESOl, 

TATAcvs-HPV E2 Genetic Fusions 

We constructed plasmids pTATAcys-HE2 . 85 and 
30 pTATAcys-HE2.121 to encode a fusion protein consisting 
of the tatAcys transport moiety (tat residues 1-21, 38- 
72) followed by the carboxy-terminal 85 or 121 residues 
of HPV-16, respectively. 



wo 94/04686 



PCT/US93/07833 



Our starting plasmids in the construction of 
pTATAcys-HE2.85 and pTATAcys-HE2 . 121 were, 
respectively, pET8c-8 5 and pET8c-i23 (both described 
above). We digested pET8c-85 and pET8c-123 with Bglll 
5 and Ncol, and isolated the large fragment in each case 
(4769 base pairs from pET8c-85 or 4880 base pairs from 
PET8C-123) for use as a vector. In both vectors, the 
E2 coding regions begin at the Ncol site. Into both 
vectors, we inserted the 220 bp Bglll-Aatll fragment 

10 from plasmid pTATAcys, and a synthetic fragment. The 
5' end of the Bglll-Aatll fragment is upstream of the 
T7 promoter and encodes the first 40 residues of 
tatAcys (i.e., residues 1-21, 38-56). The synthetic 
fragment consisting of annealed oligonucleotides 374.67 

15 (SEQ ID NO:25) and 374.68 (SEQ ID NO:26), encoded tat 
residues 57-72, with an Aatll overhand at the 5' end 
and an Ncol overhand at the 3' end. 

JB Series of Genetic Fusions 

Plasmid pJB106 encodes a fusion protein 

20 (Figure 12) (SEQ ID NO: 38) in which an amino-terminal 
methionine residue is followed by tat residues 47-58 
and then HPV-16 E2 residues 24 5-3 65. To obtain pJBlOS, 
we carried out a three-way ligation, schematically 
depicted in Figure 11. We generated a 4602 base pair 

25 vector fragment by digesting plasmid pETSc with Ncol 
and BamHI. One insert was a 3 59 base pair MspI-BamHI 
fragment from pET8c-123, encoding HPV-16 E2 residues 
248-365. The other insert was a synthetic fragment 
consisting of the annealed oligonucleotide pair, 

30 374.185 (SEQ ID NO:27) and 374.186 (SEQ ID NO:28). The 
synthetic fragment encoded the amino-terminal 
methionine and tat residues 47-58, plus HPV16 residues 
245-247 (i.e., ProAspThr) . The synthetic fragment had 
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an Ncol overhang at the 5 ' end and an Mspl overhang at 
the 3* end. 

We obtained plasmids pJB117 (SEQ ID NO:59), 
pJBllS (SEQ ID NO:60), pJB119 (SEQ ID N0:61), pJB120 
5 (SEQ ID NO: 62) and pJB122 (SEQ ID NO: 63) by PGR 

deletion cloning in a manner similar to that used for 
pTATAcys (described above and in Figure 8) . We 
constructed plasmids pJB117 and pJBllB by deleting 
segments of pTATAcys-HE2 • 121 . We constructed plasmids 

10 pJB119 and pJB120 by deleting segments of pTATAcys- 

161. In all four clonings, we used PGR primer 374.122 
(SEQ ID NO: 29) to cover the Hindlll site downstream of 
the tat-E2 coding region. In each case, the other 
primer spanned the Ndel site at the start of the 

15 tatAcys coding sequence, and deleted codons for 

residues at the beginning of tatAcys (i.e., residues 2- 
21 and 38-46 for pJB117 and pJB119; and residues 2-21 
for pJBllS and pJB120) . For deletion of residues 2- 
21, we used primer 379.11 (SEQ ID N0:30). For deletion 

20 of residues 2-21 and 38-46, we used primer 379.12 (SEQ 
ID N0:31). Following the PGR reaction, we digested the 
PGR products with Ndel and Hindlll. We then cloned the 
resulting restriction fragments into vector pTATAcys- 
HE2.121, which had been previously digested with Ndel 

25 plus Hindlll to yield a 4057 base pair receptor 

fragment. Thus, we constructed expression plasmids 
encoding fusion proteins consisting of amino acid 
residues as follows: 

JB117 = Tat47-72-HPV16 E2 245-365; 

30 JB118 = Tat38-72-HPV16 E2 245-365; 

JB119 = Tat47-62-BPVl E2 250-410; and 
JB120 = Tat38-62-BPVl E2 250-410. 

We constructed pJB122, encoding tat residues 
38-58 followed by HPV16 E2 residues 245-365 (i.e., the 
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E2 carboxy-terminal 121 amino acids) , by deleting from 
pJBllS codons for tat residues 59-72, We carried out 
this deletion by PGR, using primer 374.13 (SEQ ID 
NO: 32), which covers the Aatll site within the tat 
5 coding region, and primer 374.14 (SEQ ID NO:33), which 
covers the Aatll site slightly downstream of the unique 
Hindlll site downstream of the Tat-E2 gene. We 
digested the PGR product with Aatll and isolated the 
resulting restriction fragment. In the final pJB122 

10 construction step, we inserted the isolated Aatll 
fragment into Aatll-digested vector pJBllS , 

It should be noted that in all five of our 
pJB constructs described above, the tat coding sequence 
was preceded by a methionine codon for initiation of 

15 translation. 

Purifi cation of Tat-E2 Fusion Proteins 

In all cases, we used E. coli to express our 
^s't"E2 genetic fusions. Our general procedure for 
tat-E2 protein purification included the following 

20 initial steps: pelleting the cells; resuspending them 
in 8-10 volumes of lysis buffer (25 mM Tris (pH 7.5), 
25 mM NaGl, 1 mM DTT, 0 . 5 mM EDTA) containing protease 
inhibitors — generally, 1 mM PMSF, 4 Mg/ml E64 , 50 
Mg/ml aprotinin, 50 ^g/ml pepstatin A, and 3 mM 

25 benzamidine) ; lysing the cells in a French press (2 

passes at 12,000 psi) ; and centrifuging the lysates at 
10,000-12,000 X g for 1 hour (except FTE proteins), at 
4* C. Additional steps employed in purifying 
particular tat-E2 fusion proteins are described below. 

E2.103 and E2.249 — Following centr if ugat ion 
of the lysate, we loaded the supernatant onto a Fast S 
Sepharose column and eluted the E2.103 or E2.249 
protein with 1 M NaCl. We then further purified the 
E2.103 or E2.24 9 by chromatography on a P60 gel 
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filtration column equilibrated with 100 mM HEPES (pH 
7.5), 0.1 mM .EDTA and 1 mM DTT. 

FTE103 — Following centrif ugation of the 
lysate at 10,000 x g for 10 min. at 4° C, we recovered 
5 the FTE103 protein (which precipitated) by resuspending 
the pellet in 6 M urea and adding solid guanidine^HCl 
to a final concentration of 7 M. After centrifuging 
the suspension, we purified the FTE103 protein from the 
supernatant by chromatography on an A.5M gel filtration 

10 column in 6 M guanidine, 50 mM sodium phosphate (pH 
5.4), 10 mM DTT. We collected the FTE103-containing 
fractions from the gel filtration column according to 
the appearance of a band having an apparent molecular 
weight of 19 kDa on Coomassie-stained SDS 

15 polyacrylamide electrophoresis gels. 

FTE403 — Our purification procedure for 
FTE403 was essentially the same as that for FTE103, 
except that FTE4 03 migrated on the gel filtration 
column with an apparent molecular weight of 25 kDa. 

20 FTE501 — Following centrif ugation of the 

lysate at 10,000 x g, for 30 minutes, we resuspended 
the pellet in 6 M urea, added solid guanidine-HCl to a 
final concentration of 6 M, and DTT to a concentration 
of 10 mM. After 30 minutes at 37°C, we clarified the 

25 solution by centrif ugation at 10,000 x g for 30 

minutes. We then loaded the sample onto an A. 5 agarose 
gel filtration column in 6 M guanidine-HCl , 50 mM 
sodium phosphate (pH 5.4), 10 mM DTT and collected the 
FTE501-containing fractions from the gel filtration 

3 0 column, according to the appearance of a band having an 
apparent molecular weight of 4 0 kDa on Coomassie- 
stained SDS polyacrylamide electrophoresis gels. We 
loaded the gel filtration-purified FTE501 onto a C,^ 
reverse phase HPLC column and eluted with a^ gradient of 

35 0-75% acetonitrile in 0.1% tr if luoroacetic acid. We 
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collected the FTE501 protein in a single peak with an 
apparent molecular weight of 4 0 kDa. 

TatAcys-lOS — Following centrif ugation of 
the lysate, we loaded the supernatant onto a Q- 
5 Sepharose column equilibrated with 25 mM Tris (pH 7.5), 
0.5 mM EDTA. We loaded the Q-Sepharose column flow- 
through onto an S-Sepharose column equilibrated with 2 5 
mM MES (pH 6.0), after adjusting the Q-Sepharose column 
flow-through to about pH 6 . 0 by adding MES (pH 6.0) to 
10 a final concentration of 30 mM, We recovered the 
tatAcys-105 protein from the S-Sepharose column by 
application of sequential NaCl concentration steps in 
25 mM MES (pH 6.0). TatAcys-105 eluted in the pH 6.0 
buffer at 800-1000 mM NaCl. 
^5 TatAcvs-161 — Following centrif ugation of 

the lysate, we loaded the supernatant onto an 
S-Sepharose column equilibrated with 25 mM Tris (pH 
7.5), 0.5 mM EDTA. We recovered the tatAcys-161 from 
the S-Sepharose column by application of a NaCl step 
20 gradient in 25 mM Tris (pH 7.5). TatAcys-161 eluted in 
the pH 7.5 buffer at 500-700 mM NaCl. 

TatAcvs-249 — Following centrif ugation of 
the lysate, we loaded the supernatant onto a 
Q-Sepharose column equilibrated with 25 mM Tris (pH 
25 7.5), 0.5 mM EDTA. We recovered the tatAcys-249 from 
the S-Sepharose column by application of a NaCl step 
gradient in 25 mM Tris (pH 7.5). TatAcys-249 eluted in 
the 600-800 mM portion of the NaCl step gradient. 

TatAcvs-HE2 . 85 and TatAcvs-HF2 . T ? 1 — 
30 Following centrif ugation of the lysate, we loaded the 
supernatant onto a Q-Sepharose column. We loaded the 
flow-through onto an S-Sepharose column. We recovered 
the tatAcys-HE2 . 85 or tatAcys-HE2 . 121 from the 
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S-Sepharose column by application of a NaCl step 
gradient. Both proteins eluted with 1 M NaCl. 

HPV E2 and HPV E2CC5S — See Example 9 

(above) . 

5 *JB106 — Following centrif ugation of the 

lysate, and collection of the supernatant, we added 
NaCl to 300 mM. We loaded the supernatant with added 
NaCl onto an S-Sepharose column equilibrated with 2 5 mM 
HEPES (pH 7.5). We treated the column with sequential 

10 salt concentration steps in 25 mM HEPES (pH 7.5), 1.5 
mM EDTA, 1 mM DTT. We eluted the JB106 . protein from 
the S-Sepharose column with 1 M NaCl. 

^B117 — Following centrif ugation of the 
lysate, and collection of the supernatant, we added 

15 NaCl to 300 mM. Due to precipitation of JB117 at 300 
mM NaCl, we diluted the JB117 supernatant to 100 mM 
NaCl and batch-loaded the protein onto the S-Sepharose 
column. We eluted JB117 from the S-Sepharose column 
with 1 M NaCl in 25 mM Tris (pH 7.5), 0,3 mM DTT, 

20 JB118 — Following centrif ugation of the 

lysate, and collection of the supernatant, we added 
NaCl to 300 mM. We loaded the supernatant with added 
NaCl onto an S-Sepharose column equilibrated with 25 mM 
Tris (pH 7.5). We eluted the JB118 protein from the S- 

25 Sepharose column with 1 M NaCl in 25 mM Tris (pH 7.5), 
0.3 mM DTT . 

JB119. JB120. JB121 and JB122 — Following, 
centrif ugation of the lysate, and collection of the 
supernatant, we added NaCl to 150 mM for JB119 and 

30 JB121, and 200 mM for JB120 and JB122. We loaded the 
supernatant with added NaCl onto an S-Sepharose column 
equilibrated with 25 mM Tris (pH 7.5). We eluted 
proteins JB119, JB120, JB121 and JB122 from the S- 
Sepharose column with 1 M NaCl in 25 mM Tris (pH 7.5), 

3 5 0.3 mM DTT. 
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EXAMPLE 14 

E2 Repression Assays - Additiional Conjugates 

We tested our tat-E2 fusion proteins for 
inhibition of transcriptional activation by the 
5 full-length papillomavirus E2 protein ("repression"). 
We measured E2 repression with a transient 
co-transf ection assay in COS? cells. The COS? cells 
used in this assay were maintained in culture for only 
short periods of time. We thawed the COS? cells at 

10 passage 13 and used them only through passage 25. Long 
periods of propagation led to low levels of E2 
transcriptional activation and decreased repression and 
reproducibility. Our repression assay and method of ■ 
computing repression activity are described in Example 

15 9 (above). For the conjugates TxE2.103, TxE22.249, 

FTE103, FTE202, FTE4 03 and FTE501, we substituted the 
BPV-1 E2 transact ivator, in equal amount, for the 
HPV-16 E2 transactivator . Accordingly, instead of 
transfecting with the HPV-16 E2 expression plasmid 

2 0 pAHE2, we transfected with the BPV-l E2 expression 

plasmid pXB323, which is fully described in United 
States patent 5,219,990. 

The genetic fusion protein JB106 has 
consistently been our most potent tat-E2 repressor 
25 conjugate. Data from a repression assay comparing 

JB106 and TxHE2CCSS are shown in Table III. Figure 13 
graphically depicts the results presented in Table III. 

In addition to JB106, several other tat-E2 
repressor conjugates have yielded significant 

3 0 repression. As shown in Table IV, TxHE2 , TxHE2CCSS, 

JBll?, JB118, JB119, JB120 and JB122 displayed 
repression levels in the -r^ range. 
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TABLE III 







Protein 




average of 


average 


% 




added (ua/ml) 


cDm-bkad* 


duDlicates 


CDm-bkad 


repres; 




0 




3,872 








5 


0 




3,694 


3783 


— 


— 




0 




17,896 










0 




18,891 


18,393 


14 , 610 






1 


JB106 


16, 384 










1 


JB106 


17,249 


16 , 816 


13,033 


10.8 


10 


3 


JB106 


11,456 








3 


JB106 


10,550 


11 , 003 


7,220 


50.6 




10 


JB106 


6,170 








10 


JB106 


/ f UUD 


6 , 588 


2 , 805 


81. 0 




30 


JB106 


4,733 




15 


3 0 


JB106 


4,504 


4 , 618 


835 


94 . 3 




1 


TXHE2CCSS 


17,478 










1 


TXHE2CCSS 


18, 047 


17,762 


13 , 979 


4 . 3 




•J 




14,687 










3 


TXHE2CCSS 


15,643 


15,165 


11, 382 


22 • 1 


20 


10 


TXHE2CCSS 


12,914 








10 


TXHE2CCSS 


12 , 669 


12 , 791 


9, 008 


38 . 3 




30 


TXHE2CCSS 


7,956 










30 


TXHE2CCSS 


8,558 


8,257 


4 , 474 


69 , 4 




1 


HE2. 123 


18,290 








25 


1 


HE2. 123 


18,744 


18,517 


14 , 734 


0 




3 


HE2. 123 


17,666 










3 


HE2.123 


18,976 


18,321 


14 , 538 


1 . 3 




10 


HE2. 12 3 


18,413 










10 


HE2. 123 


17,862 


18 , 137 


14 , 354 


2 , 6 


30 


30 


HE2. 123 


18,255 










30 


HE2.123 


18, 680 


18,467 


14 , 684 


0. 3 



Bkgd = 158 cpm. 
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Table IV summarizes our tat-E2 repressor 
assay results. Although we tested all of our tat-E2 
repressor conjugates in similar assays, the conjugates 
were not all simultaneously tested in the same assay. 
5 Accordingly, we have expressed the level of repression 
activity, semi-quant itatively, as +, +/-, 

or with +++ being strong repression, and - being no 
detectable repression. Figure 13 illustrates the 
repression activity rating system used in Table IV, 

10 JB106 exemplifies the +++ activity level. TxHE2CCSS 
exemplifies the -h-f activity level. The negative 
control, HE2.12 3, exemplifies the - activity level. 
The + activity level is intermediate between the 
activity observed with TxHE2CCSS and HE2.123. The two 

15 conjugates whose activity is shown as +/- had weak (but 
detectable) activity in some assays and no detectable 
activity in other assays. 
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TABLE IV 



Protein 
TxE2. 103 
5 TXE2.249 
TXHE2 
TXHE2CCSS 
FTE103 
FTE208 
10 • FTE403 
FTE501 

TatAcys- 
105 

TatAeys- 
15 161 

TatAcys- 
249 

TatAcys- 
HE2.8 5 

2 0 TatAcys- 
HE2 . 12 1 

JB106 

JB117 

JB118 

25 JB119 

JB120 

JB122 



Tat residues 

37-72 

37-72 

37-72 

37-72 

1-67 

1-62 

1-62 

1-62 

1-21, 38-67 



E2 residues 



BPV-1 308-410 
BPV-1 162-410 
HPV-16 245-365 
HPV-16 245-365 
BPV-1 306-410 
BPV-1 311-410 
BPV-1 250-410 
BPV-1 162-410 
BPV-1 306-410 



1-21,38-62 BPV-1 250-410 



1-21,38-62 BPV-1 162-410 



1-21,38-72 HPV-16 281-365 



1-21,38-72 HPV-16 245-365 



47-58 
47-72 
38-72 
47-62 
38-62 
38-58 



HPV-16 245-365 
HPV-16 245-365 
HPV-16 245-365 
BPV-1 250-410 
BPV-1 250-410 
HPV-16 245-365 



Repression 
Level 



++ 
++ 



+ /- 
+ /- 



+++ 
++ 
++ 
++ 
++ 
++ 



wo 94/04686 



PCT/US93/07833 



FTE103, FTE403, FTE208 and FTE501, the four 
conjugates having the tat amino-terminal region (i.e., 
residues 1-21) and the cysteine-rich region (i.e., 
residues 22-37) were completely defective for 
5 repression. Since we have shown, by indirect 

immunofluorescence, that FTE501 enters cells, we 
consider it likely that the E2 repressor activity has 
been lost in the FTE series as a result of the linkage 
to the tat transport polypeptide. Our data show that 

10 the absence of the cysteine-rich region of the tat 

moiety generally increased E2 repressor activity. In 
addition, the absence of the cysteine-rich region ^in 
"tat-E2 conjugates appeared to increase protein 
production levels in E.coli . and increase protein 

15 solubility, without loss of transport into target 

cells. Deletion of the amino-terminal region of tat 
also increased E2 repressor activity. Fusion protein 
JB106, with only tat residues 47-58, was the most 
potent of our tat-E2 repressor conjugates. However, 

20 absence of the tat cysteine-rich region does not always 
result in preservation of E2 repressor activity in the 
conjugate. For example, the chemical conjugate 
TXE2.249 was insoluble and toxic to cells. Thus, 
linkage of even a cysteine-f ree portion of tat may lead 

25 to a non-functional E2 repressor conjugate. 

EXAMPLE 15 

Cleavable E2 Conjugates 

Chemical conjugation of tat moieties to E2 
protein resulted in at least a 20-fold reduction in 
3 0 binding of the E2 protein to E2 binding sites on DNA 

(data not shown) . Therefore, we conducted experiments 
to evaluate cleavable cross-linking between the tat 
transport moiety and the E2 repressor moiety. We 
tested various cleavable cross-linking methods. 
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In one series of experiments, we activated 
the cysteine sulfhydryl groups of HPV E2-CCSS protein 
with aldrithiol in 100 mM HEPES (pH 7-5), 500 itiM NaCl. 
We isolated the activated E2 repressor by gel 
5 filtration chromatography and treated it with tat37-72, 
We achieved low cross-linking efficiency because of 
rapid E2-CCSS dimer formation upon treatment with 
aldrithiol* To avoid this problem, we put the E2-CCSS 
into 8 M urea, at room temperature, and treated it with 

10 aldrithiol at 23 °C for 60 minutes under denaturing 
conditions. We then refolded the E2CCSS-aldrithiol 
adduct, isolated it by gel filtration chromatograpljy , 
and then allowed it to react with tat37-72. This 
procedure resulted in excellent cross-linking. We also 

15 cross-linked E2CSSS and E2CCSC to tat37-72, using a 
modification of the urea method, wherein we used S- 
Sepharose chromatography instead of gel filtration to 
isolate the E2-aldrithiol adducts. This modification 
increased recovery of the adducts and resulted in 

20 cross-linkage of approximately 90% of the E2 starting 
material used in the reaction. 

The cleavable tat-E2 conjugates exhibited 
activity in the repression assay. However, the 
repression activity of the cleavable conjugates was 

25 slightly lower than that of similar conjugates 

cross-linked irreversibly. The slightly lower activity 
of the cleavable conjugates may be a reflection of 
protein half-life in the cells. Tat is relatively 
stable in cells. E2 proteins generally have short 

30 half -lives in cells. Thus, irreversible cross-linkage 
between a tat moiety and an E2 moiety may stabilize the 
E2 moiety . 
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EXAMPLE 16 

Herpfes Simplex Virus Repressor Conjugate 

Herpes simplex virus ("HSV") encodes a 
transcriptional activator, VP16, which induces 
5 expression of the immediate early HSV genes. Friedman 
et al. have produced an HSV VP16 repressor by deleting 
the carboxy- terminal transactivation domain of VP16 
("Expression of a Truncated Viral Trans-Activator 
Selectively Impedes Lytic Infection by Its Cognate 

10 Virus", Nature . 335, pp. 452-54 (1988)). We have 

produced an HSV-2 VP16 repressor in a similar manner. 

To test cellular uptake andVPie repressor 
activity of transport polypeptide-VP16 repressor 
conjugates, we simultaneously transfected a 

15 VP16-dependent reporter plasmid and a VP16 repressor 
plasmid into COS7 cells. Then we exposed the 
transfected cells to a transport polypeptide-VP16 
repressor conjugate or to an appropriate control. The 
repression assay, described below, was analogous to the 

20 E2 repression assay described above, in Example 9. 

VP16 Repression Assay Plasmids 

Our reporter construct for the VP16 
repression assay was plasmid pl7 5kCAT, obtained from G. 
Hayward (see, P. O'Hare and G.S. Hayward, "Expression 

25 of Recombinant Genes Containing Herpes Simplex Virus 
Delayed-Early and Immediate-Early Regulatory Regions 
and Trans Activation by Herpes Virus Infection", J, 
Viroj^, 52, pp. 522-31 (1984)). Plasmid pl75kCAT 
contains the HSV-1 IE175 promoter driving a CAT 

3 0 reporter gene. 

Our HSV-2 transactivator construct for the 
VP16 repression assay was plasmid pXB324, which 
contained the wild-type HSV-2 VP16 gene under the 
control of the chicken B-actin promoter. We 
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constructed pXB324 by inserting into pXBlOO (P. Han et 
al., "Transactivation of Heterologous Promoters by 
HIV-l Tat", Nuc. Acids Res. . 19, pp. 7225-29 (1991)), 
between the Xhol site and BamHI site, a 280 base pair 
5 fragment containing the chicken 6-actin promoter and a 
2318 base pair BamHI-EcoRI fragment from plasmid pCA5 
(O'Hare and Hayward, supra ^ encoding the entire wild 
type HSV-2 VP16 protein. 

Tat-VP16 Repressor Fusion Protein 

^0 We produced in bacteria fusion protein tat- 

VP16R.GF (SEQ ID N0:58), consisting of amino acids 47- 
58 of HIV tat protein followed by amino acids 4 3-412 of 
HSV VP16 protein. For bacterial production of a tat- 
VP16 repressor fusion protein, we constructed plasmid 

15 pET/tat-VP16R.GF, in a three-piece ligation. The first 
fragment was the vector pET-3d (described above under 
the alternate designatiion "pET-8c") digested with Ncol 
and Bglll (approximately 4 600 base pairs) . The second 
fragment consisted of synthetic oligonucleotides 

20 374.219 (SEQ ID NO:39) and 374.220 (SEQ ID NO:40), 

annealed to form a double-stranded DNA molecule. The 
5 ' end of the synthetic fragment had an Ncol overhang 
containing an ATG translation start codon. Following 
the start codon were codons for tat residues 47-58, 

25 Immediately following the tat codons, in frame, were 

codons for VP16 residues 43-47. The 3 • terminus of the 
synthetic fragment was a blunt end for ligation to the 
third fragment, an 1134 base pair PvuII-Bglll fragment 
from pXB324R4, containing codons 48-412 of HSV-2 VP16. 

30 We derived pXB324R4 from pXB324 (described above) . 
Plasmid pXB3 2 4R2 was an intermediate in the 
construction of pXB324R4. 

We constructed pXB3 24R2 by inserting into 
pXBlOO a 1342 base pair BamHI-Aatll fragment, from 
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pXB324, encoding the N-terminal 419 amino acids of 
HSV-2 VP16. To provide an in-frame stop codon, we used 
a 73 base pair Aatll-EcoRI fragment from pSV2-CAT {C»M. 
Gorman et al.. Molecular & Cellular Biology . 2, pp. 
5 1044-51 (1982) )• Thus, pXB324R2 encoded the first 419 
amino acids of HSV-2 VP16 and an additional seven non- 
VP16 amino acids preceding the stop codon. To 
construct pXB324R4, we carried out a 3-piece ligation 
involving a 5145 base pair MluI-EcoRI fragment from 

10 PXB3 2 4R2, and two insert fragments. One insert was a 
115 base pair Mlul-Nspl fragment from pXB324R2, 
encoding the first 198 residues of VPiei The second 
insert fragment was a double-stranded synthetic DNA 
molecule consisting of the synthetic oligonucleotides 

15 374.32 (SEQ ID NO:41) and 374.33 (SEQ ID NO:42). When 
annealed^ these oligonucleotides formed a 5' Nspl 
sticky end and a 3' EcoRI sticky end. This synthetic 
fragment encoded VP16 residues 399-412, followed by a 
termination codon. Thus, plasmid pXB324R4 differed 

20 from pXB324R2 by lacking codons for VP16 amino acids 

413-419 and the seven extraneous amino acids preceding 
the stop codon. 

Purification of tat-VP16R-GF Fusion Protein 

We expressed our genetic construct for 
25 tat-VP16R.GF in E.coli . We harvested the transformed 

coli by centrif ugation ; resuspended the cells in 8-10 
volumes of lysis buffer (25 mM Tris (pH 7.5), 25 mM 
NaCl, ImM DTT, 0 . 5 mM EDTA, 1 mM PMSF, 4 ^^g/ml £64, 50 
Mg/ml aprotinin, 50 /xg/nil pepstatin a', and 3 mM 
30 benzamidine) ; lysed the cells in a French press (2 

passes at 12,000 psi) ; and centrif uged the lysate at 
10,000 to 12,000 X g for 1 hour, at 4*^0. Following 
centrif ugation of the lysate, we loaded the supernatant 
onto a Fast Q-Sepharose column equilibrated with 25 mM 
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Tris (pH 7.5), 0.5 mM EDTA. We loaded the Q-Sepharose 
flow- through onto a Fast S-Sepharose column 
equilibrated, in 2 5 mM MES (pH 6.0), 0 . l mM EDTA, 2 mM 
DTT. We recovered the tat-VPl6 fusion protein from the 
5 S-Sepharose column with sequential NaCl concentration 
steps in 25 mM MES (pH 6.0), 0.1 mM EDTA, 2 mM DTT. 
The tat-VP16 fusion protein eluted in the 600-1000 mM 
NaCl fractions. 

VP16 Repression Assay 

^0 We seeded HeLa cells in 2 4-well culture 

plates at 10^ cells/well. The following day, we 
transfected the cells, using the DEAE-dextran method, 
as described by B.R. Cullen, "Use of Eukaryotic 
Expressioon Technology in the Functional Analysis of 
15 Cloned Genes", Meth Enzvmol . . vol. 152, p. 684 (1987). 
We precipitated the DNA for the transf ections and 
redissolved it, at a concentration of approximately 100 
f^g/ml, in 100 mM NaCl, 10 mM Tris (pH 7.5). For each 
transf ection, the DNA-DEAE mix consisted of: 2 00 ng 
20 pl75kCAT (+/- 1 ng pXB324) or 200 ng pSV-CAT (control) , 
1 mg/ml DEAE-dextran, and PBS, to a final volume of 100 
Ml- We exposed the cells to this mixture for 15-20 
minutes, at 37 »C, with occasional rocking of the 
culture plates. We then added to each well, i ml fresh 
25 DC medium (DMEM + 10% serum) with 80 AtM chloroquine. 
After incubating the cells at 3 7'»C for 2.5 hours, we 
aspirated the medium from each well and replaced it 
with fresh DC containing 10% DMSO. After 2.5 minutes 
at room temperature, we aspirated the DMSO-constaining 
30 medium and replaced it with fresh DC containing 0, 10 
or 50 Mg/ml purified tat-VPl6.GF. The following day, 
we replaced the medium in each well with fresh medium 
of the same composition. Twenty-four hours later, we 
lysed the HeLa cells with 0.65% NP-4 0 (detergent) in 10 
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mM Tris (pH 8.0), 1 mM EDTA, 150 mM NaCl . We measured 
the protein concentration in each extract, for sample 
normalization in the assay. 

At a tat-VP16.GF concentration of 50 fj,g/ml, 
5 cellular toxicity interfered with the assay. At a 
concentration of 10 ng/ml, the tat-VPl6.GF fusion 
protein yielded almost complete repression of VP16- 
dependent CAT expression, with no visible cell death 
and approximately 30% repression of non-VP16-dependent 
10 CAT expression in controls. Thus, we observed specific 
repression of VP16-dependent transactivation in 
addition to a lesser amount non-specific: repression. 

EXAMPLE 17 

Transport polvpetatide - DNA Co njugates 

15 Transcriptional activation by a DNA-binding 

transcription factor can be inhibited by introducting 
into cells DNA having the binding site for that 
transcription factor. The transcription factor becomes 
bound by the introduced DNA and is rendered unavailable 
2 0 to bind at the promoter site where it normally 

functions. This strategy has been employed to inhibit 
transcriptional activation of by NF-kB (Bielinska et 
al., "Regulation of Gene Expression with Double- 
Stranded Phosphorothioate oligonucleotides". Science , 
25 vol. 250, pp. 997-1000 (1990)). Bielinska et al. 
observed dose-dependent inhibition when the double 
stranded DNA was put in the cell culture medium. We 
conjugated the transport polypeptide tat 37-72 to the 
double stranded DNA molecule to determine whether such 
3 0 conjugation would enhance the inhibition by increasing 
the cellular uptake of the DNA. 

We purchased four custom-synthesized 3 9-mer 
phosphorothioate oligonucleotides designated NFl, NF2 , 
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Z\T ' — sequences (SEQ ID 

N°-4 , (SEQ XO KCM.,, (SEQ NOMS, ena (SEQ XO 
"0.46), respectively. NFl and NF2 fcr» a dupiex 
corresponding to the wild type NF-.B binding site. NF3 
and NF4 for. a duplex corresponding to a .utant NF-.B 
binding site. 

We dissolved NFl and NF3 in water, at a 
concentration of approximately 4 .g/.i. „e then put 

triethanolamane (p„ s.j,, 50 ™ Nacl. 10 ^ Traufs 
reagent. „e allowed the reaction to proceed for 50 
Z"Ti%V°°'' "^""^"^-^ -°PP-0 the reaction 

15 to rZ' ' "° "'"^^ ^""-^ "«1. 

J-s to remove excess Traut's roarront- 

=.u w reagent. We monitored 2 60 nm 

absorbance to identifv • " ^ou nm 

fraction. o ^'^^"t^^y the olxgonucleotide-containing 
rractions. Our recovpi-rr ^-f 4-1. , • 

recovery of the oligonucleotides was 

IZZTT' '"""^^^ Traut-.odified NFl 

20 rrlLZJ: I ""^^ concentration, and annealed 

Traut-„odi£ied NF3 with NF4 0.50 ™g/i.l fin^i 

concentration,. Finally, we allowed 0.4 n,g of each 

Traut-modified DHA to react with n ^ « 

, react with 0.6 mg of tat37-72- 

BMH (prepared as described in Example 9, above, in l 
ml Of 100 inM HEPES (DH 7 51 ■ ' . -"i J 

25 '' ^ ^° minutes at room 

temperature. We monitored the extent of the cross- 
llnkmg reaction by polyacrylamide gel electrophoresis 
followed by ethidium bromide staining of the gel in 

general, we observed that about 50% of the DNA was 

modified under these conditions. 
30 These double-str'anded DNA molecules were 

tested, essentially according to the methods of 

Blelins.:a et al. , ^ 

for inhibition of NF-.B transcriptional activation. 

35 bv nf". -Snificantly enhanced the transactivation 

3 5 by NF-zcB. 
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Recombinant DNA sequences prepared by the 
processes described herein are exemplified by a culture 
deposited in the American Type Culture Collection, 
Rockville, Maryland* The Escherichia coli culture 
5 identified as pJB106 was deposited on July 28, 1993 and 
assigned ATCC accession number 69368* 

While we have described a number of 
embodiments of this invention, it is apparent that our 
basic constructions can be altered to provide other 
10 embodiments that utilize the processes and products of 
this invention. Therefore, it will be appreciated that 
the scope of this invention is to be defined by the 
appended claims rather than by the specific embodiments 
that have been presented by way of example. 
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(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 22 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:3: 
Cys Phe He Thr Lys Ala Leu Gly He Ser Tvr . 



" 15 



Arg Gin Arg Arg Arg Pro 



20 



(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 24 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

lie T.r Lys Ala Leu Gly He ser Tyr Cly Arg Lys Lys Arg Arg 



^° 15 



Gin Arg Arg Arg Pro Gly Gly Cys 



20 



(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 15 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:5 

cys Cly Gly Tyr Gly Arg Lys Lys Arg Arg Gin Arg Arg Arg P.o 
(2) INFORMATION FOR SEQ ID NO: 6: 



^° IS 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

Tyr Gly Arg Lys Lys Arg Arg Gin Arg Arg Arg Pro Gly Gly Cys 
15 10 15 

(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 56 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

Met Glu Pro Val Asp Pro Arg Leu Glu Pro Trp Lys His Pro Gly Ser 
15 10 15 



Gin Pro Lys Thr Ala Phe He Thr Lys Ala Leu Gly He Ser Tyr Gly 
20 25 30 



Arg Lys Lys Arg Arg Gin Arg Arg Arg Pro Pro Gin Gly Ser Gin Thr 
35 40 45 



His Gin Val Ser Leu Ser Lys Gin 
50 55 

(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 39 base pairs 
{ B ) TYPE : nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 
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GATCCCAGAC CCACCAGGTT TCTCTGTCGG GCCCTTAAG 
(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:9: 
AATTCTTAAG GGCCCGACAG AGAAACCTGG TGGGTCTGG 
(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5098 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: circular 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 



TTGAAGACGA 


AAGGGCCTCG 


TGATACGCCT 


ATTTTTATAG 


GTTAATGTCA 


TGATAATAAT 


60 


GGTTTCTTAG 


ACGTCAGGTG 


GCACTTTTCG 


GGGAAATGTG 


CGCGGAACCC 


CTATTTGTTT 


120 


ATTTTTCTAA 


ATACATTCAA 


ATATGTATCC 


GCTCATGAGA 


CAATAACCCT 


GATAAATGCT 


180 


TCAATAATAT 


TGAAAAAGGA 


AGAGTATGAG 


TATTCAACAT 


TTCCGTGTCG 


CCCTTATTCC 


240 


CTTTTTTGCG 


GCATTTTGCC 


TTCCTGTTTT 


TGCTCACCCA 


GAAACGCTGG 


TGAAAGTAAA 


300 


AGATGCTGAA 


GATCAGTTGG 


GTGCACGAGT 


GGGTTACATC 


GAACTGGATC 


TCAACAGCGG 


360 


TAAGATCCTT 


GAGAGTTTTC 


GCCCCGAAGA 


ACGTTTTCCA 


ATGATGAGCA 


CTTTTAAAGT 


420 


TCTGCTATGT 


GGCGCGGTAT 


TATCCCGTGT 


TGACGCCGGG 


CAAGAGCAAC 


TCGGTCGCCG 


480 


CATACACTAT 


TCTCAGAATG 


ACTTGGTTGA 


GTACTCACCA 


GTCACAGAAA 


AGCATCTTAC 


540 


GGATGGCATG 


ACAGTAAGAG 


AATTATGCAG 


TGCTGCCATA 


ACCATGAGTG 


ATAACACTGC 


600 


GGCCAACTTA 


CTTCTGACAA 


CGATCGGAGG 


ACCGAAGGAG 


CTAACCGCTT 


TTTTGCACAA 


660 


CATGGGGGAT 


CATGTAACTC 


GCCTTGATCG 


TTGGGAACCG 


GAGCTGAATG 


AAGCCATACC 


720 
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AAACGACGAG 


^ CGTGACACCA CGATGCCTGC 


: AGCAATGGCA ACAACGTTGC GCAAACTATT 


780 


AACTGGCGAA 




XAoUTTCGCG 


GCAACAATTA ATAGACTGGA TGGAGGCGGA 


840 


TAAAGTTGCA 






CCTTCCGGCT 


' GGCTGGTTTA TTGCTGATAA 


900 


ATCTGGAGCC 


GGTGAGCGTG 




X A X t- ATTG C A 


(jwAwTGGGGC 


; CAGATGGTAA 


960 


GCCCTCCCGT 


ATCGTAGTTA 


TCT A P A f^fl A n 


fiGGG AGTPaP 
wuwwnw X ^Au 


^*^*ft ft 

U U AA wTATG G 


ATGAACGAAA 


1020 


T AG AC AG AT C 


GCTGAGATAG 


GTG CCTCACT 


O ATTA 211^ PUT 


X b G TAA CTG T 


CAGACCAAGT 


1080 


X lAL-xCATAT 


ATACTTTAGA 


TTGATTTAAA 


Aw X X ^/i X X X X 


•Pfi. T'P'Pfc ft ft ft 

X AAX X xAAAA 


GGATCTAGGT 


1140 


vjAAIjATCCTT 


TTTGATAATC 


TC AT G ACC AA 


AATnPPT*T2i 21 
xxnxwv^ox XAn 


X UAGT xTT 


CGTTCCACTG 


1200 


AGCGTCAGAC 


CCCGTAGAAA 


AG AT C AAAC C 


ATr*TTPTTP li 
x\Xi.#X XV«X XviA 


k»ATCCTTTTT 


TTCTGCGCGT 


1260 


AATCTG CTG C 


TTG C AAAC AA 


AAA A AOPAPP 


PPTIi PP& ^ 


GTGGTTTGTT 


TGCCGGATCA 


1320 


AGAGCTACCA 


ACTCTTTTTP 


Pft A A GflT* Zl 21 P 


X UGCXTUAGC 


AGAGCGCAGA 


TACCAAATAC 


1380 


TGTCCTTCTA 


GTGTAGCCGT 


A f5 T* T A f2 P P 21 


CCACTTCAAG 


AACTCTGTAG 


CACCGCCTAC 


1440 


ATACCTCG CT 


CTG CTAATCP 




GGCTGCTGCC 


AGTGG OGATA 


AGTCGTGTCT 


1500 


TACCGGGTTG 




«A X Ai» X X ACC 


GGATAAGGCG 


CAGCGGTCGG 


GCTGTU^CGGG 


1560 


GGGTTCGTGC 


ACACAGCCPA 


X X d^i^AOL-vj 


AACGACCTAC 


ACCGAACTGA 


GATACCTACA 


1620 


GCGTGAGCAT 


TGAGAAAGCf; 


P P B. PP P'PT'/^ o 


CGAAGGGAGA 


AAGGCGGACA 


GGTATCCGGT 


1680 


AAGCGGCAGG 


GTCGGAACAC 


21 p li p p*p 7i 


G AGGG AG CTT 


CCAGGGGGAA 


ACGCCTGGTA 


1740 


TCTTT AT AG T 


CCTGTCGGGT 


X X wOwwrlWw X 


PTP TV PfTr^ ft 1^ 

X W> Aw X X wAu 


wG xCGATTTT 


TGTGATGCTC 


1800 


GTCAGGGGGG 


CGGAGCCTAT 


CG AA A A AP'GP 


P 2i P P^ ivppr^r* 
L# AVj W AA wdv 


VjwCTTTTTAC 


GGTTCCTGGC 


1860 


CTTTTG CTGG 


CCTTTTGCTC 


APA'PGT*'r*P'P^ 
X u X X ^ X X 


TP'PTPPPT'Pa 

X WW X (jCr^ X X A 


TwCCCTGATT 


CTGTGGATAA 


1920 


CCGTATTACC 


GCCTTTGAGT 


GAGPTf; AT AP 


U-bW X wG wwGC 


a o r^r^r* ft ft r^r* ft 
AWvWwGAAwGA 


CCGAGCGCAG 


1980 


CGAGTCAGTG 


AGCGAGGAAG 


CGG A AG AGPP 


r^r'TG A TP PP P 
WW XOAX wWw^ 


TATTTTCTCC 


TTACGCATCT 


2040 


GTGCGGTATT 


TCACACCGCA 


TATATGCTG C 


ACTCTCAGTA 


CAATCTGCTC 


TGATGCCGCA 


2100 


TAG TT AAG C C 


AGTATACACT 


CCGCTATCGC 


TACGTGACTG 


GGTCATGGCT 


GCGCCCCGAC 


2160 


ACCCGCCAAC 


ACCCGCTGAC 


GCGCCCTGAC 


GGGCTTGTCT 


GCTCCCGGCA 


TCCGCTTACA 


2220 


GACT^GCTGT 


GACCGTCTCC 


GGGAGCTGCA 


TGTGTCAGAG 


GTTTTCACCG 


TCATCACCGA 


2280 


AACGCGCGAG 


GCAGCTGCGG 


TAAAGCTCAT 


CAGCGTGGTC 


GTGAAGCGAT 


TCACAGATGT 


2340 


CTGCCTGTTC 


ATCCGCGTCC 


AGCTCGTTGA 


GTTTCTCCAG 


AAGCGTTAAT 


GTCTGGCTTC 


2400 
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TGATAAAGCG GGCCATGTTA AGGGCGGTTT TTTCCTGTTT GGTCACTTGA TGCCTCCGTG 2460 

TAAGGGGGAA TTTCTGTTCA TGGGGGTAAT GATACCGATG AAACGAGAGA GGATGCTCAC 2520 

GATACGGGTT ACTGATGATG AACATGCCCG GTTACTGGAA CGTTGTGAGG GTAAACAACT 2580 

GGCGGTATGG ATGCGGCGGG ACCAGAGAAA AATCACTCAG GGTCAATGCC AGCGCTTCGT 2640 

TAATACAGAT GTAGGTGTTC CACAGGGTAG CCAGCAGCAT CCTGCGATGC AGATCCGGAA 2700 

CATAATGGTG CAGGGCGCTG ACTTCCGCGT TTCCAGACTT TACGAAACAC GGAAACCGAA 2760 

GACCATTCAT GTTGTTGCTC AGGTCGCAGA CGTTTTGCAG CAGCAGTCGC TTCACGTTCG 2820 

CTCGCGTATC GGTGATTCAT TCTGCTAACC AGTAAGGCAA CCCCGCCAGC CTAGCCGGGT 2880 

CCTCAACGAC AGGAGCACGA TCATGCGCAC CCGTGGCCAG GACCCAACGC TGCCCGAGAT 2940 

GCGCCGCGTG CGGCTGCTGG AGATGGCGGA CGCGATGGAT ATGTTCTGCC AAGGGTTGGT 3000 

TTGCGCATTC ACAGTTCTCC GCAAGAATTG ATTGGCTCCA ATTCTTGGAG TGGTGAATCC 3060 

GTTAGCGAGG TGCCGCCGGC TTCCATTCAG GTCGAGGTGG CCCGGCTCCA TGCACCGCGA 3120 

CGCAACGCGG GGAGGCAGAC AAGGTATAGG GCGGCGCCTA CAATCCATGC CAACCCGTTC 3180 

CATGTGCTCG CCGAGGCGGC ATAAATCGCC GTGACGATCA GCGGTCCAGT GATCGAAGTT 3240 

AGGCTGGTAA GAGCCGCGAG CGATCCTTGA AGCTGTCCCT GATGGTCGTC ATCTACCTGC 3300 

CTGGACAGCA TGGCCTGCAA CGCGGGCATC CCGATGCCGC CGGAAGCGAG AAGAATCATA 33 60 

ATGGGGAAGG CCATCCAGCC TCGCGTCGCG AACGCCAGCA AGACGTAGCC CAGCGCGTCG 3420 

GCCGCCATGC CGGCGATAAT GGCCTGCTTC TCGCCGAAAC GTTTGGTGGC GGGACCAGTG 34 80 

ACGAAGGCTT GAGCGAGGGC GTGCAAGATT CCGAATACCG CAAGCGACAG GCCGATCATC 3540 

GTCGCGCTCC AGCGAAAGCG GTCCTCGCCG AAAATGACCC AGAGCGCTGC CGGCACCTGT 3600 

CCTACGAGTT GCATGATAAA G7VAGACAGTC ATAAGTGCGG CGACGATAGT CATGCCCCGC 3660 

GCCCACCGGA AGGAGCTGAC TGGGTTGAAG GCTCTCAAGG GCATCGGTCG ACGCTCTCCC 3720 

TTATGCGACT CCTGCATTAG GAAGCAGCCC AGTAGTAGGT TGAGGCCGTT GAGCACCGCC 3780 

GCCGCAAGGA ATGGTGCATG CAAGGAGATG GCGCCCAACA GTCCCCCGGC CACGGGGCCT 3840 

GCCACCATAC CCACGCCGAA ACAAGCGCTC ATGAGCCCGA AGTGGCGAGC CCGATCTTCC 3900 

CCATCGGTGA TGTCGGCGAT ATAGGCGCCA GCAACCGCAC CTGTGGCGCC GGTGATGCCG 3960 

GCCACGATGC GTCCGGCGTA GAGGATCGAG ATCTCGATCC CGCGAAATTA ATACGACTCA 4020 

CTATAGGGAG ACCACAACGG TTTCCCTCTA GAAATAATTT TGTTTAACTT TAAGAAGGAG 4080 
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ATATACATAT GGAACCGGTC GACCCGCGTC TGGAACCATG GAAACACCCC GGGTCCCAGC 4140 

CGAAAACCGC GTGCACCAAC TGCTACTGCA AAAAATGCTG CTTCCACTGC CAGGTTTGCT 4200 

TCATCACCAA AGCCCTAGGT ATCTCTTACG GCCGTAAAAA ACGTCGTCAG CGACGTCGTC 4260 

CGCCGCAGGG ATCCCAGACC CACCAGGTTT CTCTGTCGGG CCCGGCGGAC AGCGGCGACG 4320 

CCCTGCTGGA GCGCAACTAT CCCACTGGCG CGGAGTTCCT CGGCGACGGC GGCGACGTCA 4380 

GCTTCAGCAC CCGCGGCACG CAGAACTGGA CGGTGGAGCG GCTGCTCCAG GCGCACCGCC 4440 

AACTGGAGGA GCGCGGCTAT GTGTTCGTCG GCTACCACGG CACCTTCCTC GAAGCGGCGC 4500 

AAAGCATCGT CTTCGGCGGG GTGCGCGCGC GCAGCCAGGA CCTCGACGCG ATCTGGCGCG 4560 

GTTTCTATAT CGCCGGCGAT CCGGCGCTGG CCTACGGCTA CGCCCAGGAC CAGGAACCCG 4620 

ACGCACGCGG CCGGATCCGC AACGGTGCCC TGCTGCGGGT CTATGTGCCG CGCTCGAGCC 4680 

TGCCGGGCTT CTACCGCACC AGCCTGACCC TGGCCGCGCC GGAGGCGGCG GGCGAGGTCG 4740 

AACGGCTGAT CGGCCATCCG CTGCCGCTGC GCCTGGACGC CATCACCGGC CCCGAGGAGG 4800 

AAGGCGGGCG CCTGGAGACC ATTCTCGGCT GGCCGCTGGC CGAGCGCACC GTGGTGATTC 4860 

CCTCGGCGAT CCCCACCGAC CCGCGCAACG TCGGCGGCGA CCTCGACCCG TCCAGCATCC 4920 

CCGACAAGGA ACAGGCGATC AGCGCCCTGC CGGACTACGC CAGCCAGCCC GGCAAACCGC 4980 

CGCGCGAGGA CCTGAAGTAA CTGCCGCGAC CGGCCGGCTC CCTTCGCAGG AGCCGGCCTT 5040 

CTCGGGGCCT GGCCATACAT CAGGTTTTCC TGATGCCAGC CCAATCGAAT ATGAATTC 5098 
(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4910 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: circular 

(ii) MOLECULE TYPE: DNA (genomic) 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

TTGAAGACGA AAGGGCCTCG TGATACGCCT ATTTTTATAG GTTAATGTCA TGATAATAAT 60 

GGTTTCTTAG ACGTCAGGTG GCACTTTTCG GGGAAATGTG CGCGGAACCC CTATTTGTTT 120 

ATTTTTCTAA ATACATTCAA ATATGTATCC GCTCATGAGA CAATAACCCT GATAAATGCT 180 

TCAATAATAT TGAAAAAGGA AGAGTATGAG TATTCAACAT TTCCGTGTCG CCCTTATTCC 240 
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CTTTTTTGCG GCATTTTGCC TTCCTGTTTT TGCTCACCCA GAAACGCTGG TGAAAGTAAA 300 

AGATGCTGAA GATCAGTTGG GTGCACGAGT GGGTTACATC GAACTGGATC TCAACAGCGG 360 

TAAGATCCTT GAGAGTTTTC GCCCCGAAGA ACGTTTTCCA ATGATGAGCA CTTTTAAAGT 420 

TCTGCTATGT GGCGCGGTAT TATCCCGTGT TGACGCCGGG CAAGAGCAAC TCGGTCGCCG 480 

CATACACTAT TCTCAGAATG ACTTGGTTGA GTACTCACCA GTCACAGAAA AGCATCTTAC 540 

GGATGGCATG ACAGTAAGAG AATTATGCAG TGCTGCCATA ACCATGAGTG ATAACACTGC 600 

GGCCAACTTA CTTCTGACAA CGATCGGAGG ACCGAAGGAG CTAACCGCTT TTTTGCACAA 660 

CATGGGGGAT CATGTAACTC GCCTTGATCG TTGGGAACCG GAGCTGAATG AAGCCATACC 720 

AAACGACGAG CGTGACACCA CGATGCCTGC AGCAATGGCA ACAACGTTGC GCAAACTATT 780 

AACTGGCGAA CTACTTACTC TAGCTTCCCG GCAACAATTA ATAGACTGGA TGGAGGCGGA 840 

TAAAGTTGCA GGACCACTTC TGCGCTCGGC CCTTCCGGCT GGCTGGTTTA TTGCTGATAA 900 

ATCTGGAGCC GGTGAGCGTG GGTCTCGCGG TATCATTGCA GCACTGGGGC CAGATGGTAA 960 

GCCCTCCCGT ATCGTAGTTA TCTACACGAC GGGGAGTCAG GCAACTATGG ATGAACGAAA 1020 

TAGACAGATC GCTGAGATAG GTGCCTCACT GATTAAGCAT TGGTAACTGT CAGACCAAGT 1080 

TTACTCATAT ATACTTTAGA TTGATTTAAA ACTTCATTTT TAATTTAAAA GGATCTAGGT 1140 

GAAGATCCTT TTTGATAATC TCATGACCAA AATCCCTTAA CGTGAGTTTT CGTTCCACTG 1200 

AGCGTCAGAC CCCGTAGAAA AGATCAAAGG ATCTTCTTGA GATCCTTTTT TTCTGCGCGT 12 60 

AATCTGCTGC TTGCAAACAA AAAAACCACC GCTACCAGCG GTGGTTTGTT TGCCGGATCA 1320 

AGAGCTACCA ACTCTTTTTC CGAAGGTAAC TGGCTTCAGC AGAGCGCAGA TACCAAATAC 1380 

TGTCCTTCTA GTGTAGCCGT AGTTAGGCCA CCACTTCAAG AACTCTGTAG CACCGCCTAC 1440 

ATACCTCGCT CTGCTAATCC TGTTACCAGT GGCTGCTGCC AGTGGCGATA AGTCGTGTCT 1500 

TACCGGGTTG GACTCAAGAC GATAGTTACC GGATAAGGCG CAGCGGTCGG GCTGAACGGG 1560 

GGGTTCGTGC ACACAGCCCA GCTTGGAGCG AACGACCTAC ACCGAACTGA GATACCTACA 1620 

GCGTGAGCAT TGAGAAAGCG CCACGCTTCC CGAAGGGAGA AAGGCGGACA GGTATCCGGT 1680 

AAGCGGCAGG GTCGGAACAG GAGAGCGCAC GAGCGAGCTT CCAGGGGGAA ACGCCTGGTA 1740 

TCTTTATAGT CCTGTCGGGT TTCGCCACCT CTGACTTGAG CGTCGATTTT TGTGATGCTC 1800 

GTCAGGGGGG CGGAGCCTAT GGAAAAACGC CAGCAACGCG GCCTTTTTAC GGTTCCTGGC 1860 

CTTTTGCTGG CCTTTTGCTC ACATGTTCTT TCCTGCGTTA TCCCCTGATT CTGTGGATAA 192 0 
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CCGTATTACC 
CGAGTCAGTG 
GTGCGGTATT 
TAGTTAAGCC 
ACCCGCCAAC 
GACAAGCTGT 
AACGCGCGAG 
CTGCCTGTTC 
TGATAAAGCG 
TAAGGGGGAA 
GATACGGGTT 
GGCGGTATGG 
TAATACAGAT 
CATAATGGTG 
GACCATTCAT 
CTCGCGTATC 
CCTCAACGAC 
GCGCCGCGTG 
TTGCGCATTC 
GTTAGCGAGG 
CGCAACGCGG 
CATGTGCTCG 
AGGCTGGTAA 
CTGGACAGCA 
ATGGGGAAGG 
GCCGCCATGC 
ACGAAGGCTT 
GTCGCGCTCC 



GCCTTTGAGT 
AGCGAGGAAG 
TCACACCGCA 
AGTATACACT 
ACCCGCTGAC 
GACCGTCTCC 
GCAGCTGCGG 
ATCCGCGTCC 
GGCCATGTTA 
TTTCTGTTCA 
ACTGATGATG 
ATGCGGCGGG 
GTAGGTGTTC 
CAGGGCGCTG 
GTTGTTGCTC 
GGTGATTCAT 
AGGAGCACGA 
CGGCTGCTGG 
ACAGTTCTCC 
TGCCGCCGGC 
GGAGGCAGAC 
CCGAGGCGGC 
GAGCCGCGAG 
TGGCCTGCAA 
CCATCCAGCC 
CGGCGATAAT 
GAGCGAGGGC 
AGCGAAAGCG 



-93- 

GAGCTGATAC CGCTCGCCGC AGCCGAACGA 
CGGAAGAGCG CCTGATGCGG TATTTTCTCC 
TATATGGTGC ACTCTCAGTA CAATCTGCTC 
CCGCTATCGC TACGTGACTG GGTCATGGCT 
GCGCCCTGAC GGGCTTGTCT GCTCCCGGCA 
GGGAGCTGCA TGTGTCAGAG GTTTTCACCG 
TAAAGCTCAT CAGCGTGGTC GTGAAGCGAT 
AGCTCGTTGA GTTTCTCCAG AAGCGTTAAT 
AGGGCGGTTT TTTCCTGTTT GGTCACTTGA 
TGGGGGTAAT GATACCGATG AAACGAGAGA 
AACATGCCCG GTTACTGGAA CGTTGTGAGG 
ACCAGAGAAA AATCACTCAG GGTCAATGCC 
CACAGGGTAG CCAGCAGCAT CCTGCGATGC 
ACTTCCGCGT TTCCAGACTT TACGAAACAC 
AGGTCGCAGA CGTTTTGCAG CAGCAGTCGC 
TCTGCTAACC AGTAAGGCAA CCCCGCCAGC 
TCATGCGCAC CCGTGGCCAG GACCCAACGC 
AGATGGCGGA CGCGATGGAT ATGTTCTGCC 
GCAAGAATTG ATTGGCTCCA ATTCTTGGAG 
TTCCATTCAG GTCGAGGTGG CCCGGCTCCA 
AAGGTATAGG GCGGCGCCTA CAATCCATGC 
ATAAATCGCC GTGACGATCA GCGGTCCAGT 
CGATCCTTGA AGCTGTCCCT GATGGTCGTC 
CGCGGGCATC CCGATGCCGC CGGAAGCGAG 
TCGCGTCGCG AACGCCAGCA AGACGTAGCC 
GGCCTGCTTC TCGCCGAAAC GTTTGGTGGC 
GTGCAAGATT CCGAATACCG CAAGCGACAG 
GTCCTCGCCG AAAATGACCC AGAGCGCTGC 



CCGAGCGCAG 
TTACGCATCT 
TGATGCCGCA 
GCGCCCCGAC 
TCCGCTTACA 
TCATCACCGA 
TCACAGATGT 
GTCTGGCTTC 
TGCCTCCGTG 
GGATGCTCAC 
GTAAACAACT 
AGCGCTTCGT 
AGATCCGGAA 
GGAAACCGAA 
TTCACGTTCG 
CTAGCCGGGT 
TGCCCGAGAT 
AAGGGTTGGT 
TGGTGAATCC 
TGCACCGCGA 
CAACCCGTTC 
GATCGAAGTT 
ATCTACCTGC 
AAGAATCATA 
CAGCGCGTCG 
GGGACCAGTG 
GCCGATCATC 
CGGCACCTGT 



1980 
2040 
2100 
2160 
2220 
2280 
2340 
2400 
2460 
2520 
2580 
.2640 
2700 
2760 
2820 
2880 
2940 
3000 
3060 
3120 
3180 
3240 
3300 
3360 
3420 
3480 
3540 
3600 
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CCTACGAGTT GCATGATAAA GAAGACAGTC ATAAGTGCGG CGACGATAGT CATGCCCCGC 3660 

GCCCACCGGA AGGAGCTGAC TGGGTTGAAG GCTCTCAAGG GCATCGGTCG ACGCTCTCCC 3720 

TTATGCGACT CCTGCATTAG GAAGCAGCCC AGTAGTAGGT TGAGGCCGTT GAGCACCGCC 3780 

GCCGCAAGGA ATGGTGCATG CAAGGAGATG GCGCCCAACA GTCCCCCGGC CACGGGGCCT 3840 

GCCACCATAC CCACGCCGAA ACAAGCGCTC ATGAGCCCGA AGTGGCGAGC CCGATCTTCC 3900 

CCATCGGTGA TGTCGGCGAT ATAGGCGCCA GCAACCGCAC CTGTGGCGCC GGTGATGCCG 3960 

GCCACGATGC GTCCGGCGTA GAGGATCGAG ATCTCGATCC CGCGAAATTA ATACGACTCA 4020 

CTATAGGGAG ACCACAACGG TTTCCCTCTA GAAATAATTT TGTTTAACTT TAAGAAGGAG 4080 

ATATATATGG AACCGGTCGT TTCTCTGTCG GGCCCGGCGG ACAGCGGCGA CGCCCTGCTG 4140 

GAGCGCAACT ATCCCACTGG CGCGGAGTTC CTCGGCGACG GCGGCGACGT CAGCTTCAGC 4200 

ACCCGCGGCA CGCAGAACTG GACGGTGGAG CGGCTGCTCC AGGCGCACCG CCAACTGGAG 4260 

GAGCGCGGCT ATGTGTTCGT CGGCTACCAC GGCACCTTCC TCGAAGCGGC GCAAAGCATC 4320 

GTCTTCGGCG GGGTGCGCGC GCGCAGCCAG GACCTCGACG CGATCTGGCG CGGTTTCTAT 4380 

ATCGCCGGCG ATCCGGCGCT GGCCTACGGC TACGCCCAGG ACCAGGAACC CGACGCACGC 4440 

GGCCGGATCC GCAACGGTGC CCTGCTGCGG GTCTATGTGC CGCGCTCGAG CCTGCCGGGC 4500 

TTCTACCGCA CCAGCCTGAC CCTGGCCGCG CCGGAGGCGG CGGGCGAGGT CGAACGGCTG 4560 

ATCGGCCATC CGCTGCCGCT GCGCCTGGAC GCCATCACCG GCCCCGAGGA GGAAGGCGGG 4620 

CGCCTGGAGA CCATTCTCGG CTGGCCGCTG GCCGAGCGCA CCGTGGTGAT TCCCTCGGCG 4680 

ATCCCCACCG ACCCGCGCAA CGTCGGCGGC GACCTCGACG CGTCCAGCAT CCCCGACAAG 4740 

GAACAGGCGA TCAGCGCCCT GCCGGACTAC GCCAGCCAGC CCGGCAAACC GCCGCGCGAG 4800 

GACCTGAAGT AACTGCCGCG ACCGGCCGGC TCCCTTCGCA GGAGCCGGCC TTCTCGGGGC 4860 

CTGGCCATAC ATCAGGTTTT CCTGATGCCA GCCCAATCGA ATATGAATTC 4910 
(2 ) INFORMATION FOR SEQ ID NO:12: 

'(i)- SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) ■ MOLECULE TYPE: DNA (genomic) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 
TATGGAACCG GTCGTTTCTC TGTCGGGCC 
(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 
CGACAGAGAA ACGACCGGTT CCA 
(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4977 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: circular 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 



TTCTTGAAGA 


CGAAAGGGCC 


. TCGTGATACG 


CCTATTTTTA 


TAGGTTAATG 


TCATGATAAT 


60 


AATGGTTTCT 


TAGACGTCAG 


GTGGCACTTT 


TCGGGGAAAT 


GTGCGCGGAA 


CCCCTATTTG 


120 


TTTATTTTTC 


TAAATACATT 


CAAATATGTA 


TCCGCTCATG 


AGACAATAAC 


CCTGATAAAT 


180 


GCTTCAATAA 


TATTGAAAAA 


GGAAGAGTAT 


GAGTATTCAA 


CATTTCCGTG 


TCGCCCTTAT 


240 


TCCCTTTTTT 


GCGGCATTTT 


GCCTTCCTGT 


TTTTGCTCAC 


CCAGAAACGC 


TGGTGAAAGT 


300 


AAAAGATGCT 


GAAGATCAGT 


TGGGTGCACG 


AGTGGGTTAC 


ATCGAACTGG 


ATCTCAACAG 


360 


CGGTAAGATC 


CTTGAGAGTT 


TTCGCCCCGA 


AGAACGTTTT 


CCAATGATGA 


GCACTTTTAA 


420 


AGTTCTGCTA 


TGTGGCGCGG 


TATTATCCCG 


TGTTGACGCC 


GGGCAAGAGC 


AACTCGGTCG 


480 


CCGCATACAC 


TATTCTCAGA 


ATGACTTGGT 


TGAGTACTCA 


CCAGTCACAG 


AAAAGCATCT 


540 


TACGGATGGC 


ATGACAGTAA 


GAGAATTATG 


CAGTGCTGCC 


ATAACCATGA 


GTGATAACAC 


600 


TGCGGCCAAC 


TTACTTCTGA 


CAACGATCGG 


AGGACCGAAG 


GAGCTAACCG 


CTTTTTTGCA 


660 
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CAACATGGGG GATCATGTAA CTCGCCTTGA TCGTTGGGAA CCGGAGCTGA ATGAAGCCAT 720 

ACCAAACGAC GAGCGTGACA CCACGATGCC TGCAGCAATG GCAACAACGT TGCGCAAACT 780 

ATTAACTGGC GAACTACTTA CTCTAGCTTC CCGGCAACAA TTAATAGACT GGATGGAGGC 840 

GGATAAAGTT GCAGGACCAC TTCTGCGCTC GGCCCTTCCG GCTGGCTGGT TTATTGCTGA 900 

TAAATCTGGA GCCGGTGAGC GTGGGTCTCG CGGTATCATT GCAGCACTGG GGCCAGATGG 960 

TAAGCCCTCC CGTATCGTAG TTATCTACAC GACGGGGAGT CAGGCAACTA TGGATGAACG 1020 

AAATAGACAG ATCGCTGAGA TAGGTGCCTC ACTGATTAAG CATTGGTAAC TGTCAGACCA 1080 

AGTTTACTCA TATATACTTT AGATTGATTT AAAACTTCAT TTTTAATTTA AAAGGATCTA 1140 

GGTGAAGATC CTTTTTGATA ATCTCATGAC CAAAATCCCT TAACGTGAGT TTTCGTTCCA 1200 

CTGAGCGTCA GACCCCGTAG AAAAGATCAA AGGATCTTCT TGAGATCCTT TTTTTCTGCG 1260 

CGTAATCTGC TGCTTGCAAA CAAAAAAACC ACCGCTACCA GCGGTGGTTT GTTTGCCGGA 1320 

TCAAGAGCTA CCAACTCTTT TTCCGAAGGT AACTGGCTTC AGCAGAGCGC AGATACCAAA 1380 

TACTGTCCTT CTAGTGTAGC CGTAGTTAGG CCACCACTTC AAGAACTCTG TAGCACCGCC 1440 

TACATACCTC GCTCTGCTAA TCCTGTTACC AGTGGCTGCT GCCAGTGGCG ATAAGTCGTG 1500 

TCTTACCGGG TTGGACTCAA GACGATAGTT ACCGGATAAG GCGCAGCGGT CGGGCTGAAC 1560 

GGGGGGTTCG TGCACACAGC CCAGCTTGGA GCGAACGACC TACACCGAAC TGAGATACCT 1620 

ACAGCGTGAG CATTGAGAAA GCGCCACGCT TCCCGAAGGG AGAAAGGCGG ACAGGTATCC 1680 

GGTAAGCGGC AGGGTCGGAA CAGGAGAGCG CACGAGGGAG CTTCCAGGGG GAAACGCCTG 1740 

GTATCTTTAT AGTCCTGTCG GGTTTCGCCA CCTCTGACTT GAGCGTCGAT TTTTGTGATG 1800 

CTCGTCAGGG GGGCGGAGCC TATGGAAAAA CGCCAGCAAC GCGGCCTTTT TACGGTTCCT 1860 

GGCCTTTTGC TGGCCTTTTG CTCACATGTT CTTTCCTGCG TTATCCCCTG ATTCTGTGGA 1920 

TAACCGTATT ACCGCCTTTG AGTGAGCTGA TACCGCTCGC CGCAGCCGAA CGACCGAGCG 1980 

CAGCGAGTCA GTGAGCGAGG AAGCGGAAGA GCGCCTGATG CGGTATTTTC TCCTTACGCA 2040 

TCTGTGCGGT ATTTCACACC GCATATATGG TGCACTCTCA GTACAATCTG CTCTGATGCC 2100 

GCATAGTTAA GCCAGTATAC ACTCCGCTAT CGCTACGTGA CTGGGTCATG GCTGCGCCCC 2160 

GACACCCGCC AACACCCGCT GACGCGCCCT GACGGGCTTG TCTGCTCCCG GCATCCGCTT 2220 

ACAGACAAGC TGTGACCGTC TCCGGGAGCT GCATGTGTCA GAGGTTTTCA CCGTCATCAC 2280 

CGAAACGCGC GAGGCAGCTG CGGTAAAGCT CATCAGCGTG GTCGTGAAGC GATTCACAGA 2340 
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TGTCTGCCTG 
TTCTGATAAA 
GTGTAAGGGG 
CACGATACGG 
ACTGGCGGTA 
CGTTAATACA 
GAACATAATG 
GAAGACCATT 
TCGCTCGCGT 
GGTCCTCAAC 
GATGCGCCGC 
GGTTTGCGCA 
TCCGTTAGCG 
CGACGCAACG 
TTCCATGTGC 
GTTAGGCTGG 
TGCCTGGACA 
ATAATGGGGA 
TCGGCCGCCA 
GTGACGAAGG 
ATCGTCGCGC 
TGTCCTACGA 
CGCGCCCACC 
CCCTTATGCG 
GCCGCCGCAA 
CCTGCCACCA 
TCCCCATCGG 
CCGGCCACGA 
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TTCATCCGCG TCCAGCTCGT TGAGTTTCTC 
GCGGGCCATG TTAAGGGCGG TTTTTTCCTG 
GAATTTCTGT TCATGGGGGT AATGATACCG 
GTTACTGATG ATGAACATGC CCGGTTACTG 
TGGATGCGGC GGGACCAGAG AAAAATCACT 
GATGTAGGTG TTCCACAGGG TAGCCAGCAG 
GTGCAGGGCG CTGACTTCCG CGTTTCCAGA 
CATGTTGTTG CTCAGGTCGC AGACGTTTTG 
ATCGGTGATT CATTCTGCTA ACCAGTAAGG 
GACAGGAGCA CGATCATGCG CACCCGTGGC 
GTGCGGCTGC TGGAGATGGC GGACGCGATG 
TTCACAGTTC TCCGCAAGAA TTGATTGGCT 
AGGTGCCGCC GGCTTCCATT CAGGTCGAGG 
CGGGGAGGCA GACAAGGTAT AGGGCGGCGC 
TCGCCGAGGC GGCATAAATC GCCGTGACGA 
TAAGAGCCGC GAGCGATCCT TGAAGCTGTC 
GCATGGCCTG CAACGCGGGC ATCCCGATGC 
AGGCCATCCA GCCTCGCGTC GCGAACGCCA 
TGCCGGCGAT AATGGCCTGC TTCTCGCCGA 
CTTGAGCGAG GGCGTGCAAG ATTCCGAATA 
TCCAGCGAAA GCGGTCCTCG CCGAAAATGA 
GTTGCATGAT AAAGAAGACA GTCATAAGTG 
GGAAGGAGCT GACTGGGTTG AAGGCTCTCA 
ACTCCTGCAT TAGGAAGCAG CCCAGTAGTA 
GGAATGGTGC ATGCAAGGAG ATGGCGCCCA 
TACCCACGCC GAAACAAGCG CTCATGAGCC 
TGATGTCGGC GATATAGGCG CCAGCAACCG 
TGCGTCCGGC GTAGAGGATC GAGATCTCGA 



CAGAAGCGTT 
TTTGGTCACT 
ATGAAACGAG 
GAACGTTGTG 
CAGGGTCAAT 
CATCCTGCGA 
CTTTACGAAA 
CAGCAGCAGT 
CAACCCCGCC 
CAGGACCCAA 
GATATGTTCT 
CCAATTCTTG 
TGGCCCGGCT 
CTACAATCCA 
TCAGCGGTCC 
CCTGATGGTC 
CGCCGGAAGC 
GCAAGACGTA 
AACGTTTGGT 
CCGCAAGCGA 
CCCAGAGCGC 
CGGCGACGAT 
AGGGCATCGG 
GGTTGAGGCC 
ACAGTCCCCC 
CGAAGTGGCG 
CACCTGTGGC 
TCCCGCGAAA 



AATGTCTGGC 
TGATGCCTCC 
AGAGGATGCT 
AGGGTAAACA 
GCCAGCGCTT 
TGCAGATCCG 
CACGGAAACC 
CGCTTCACGT 
AGCCTAGCCG 
CGCTGCCCGA 
GCCAAGGGTT 
GAGTGGTGAA 
CCATGCACCG 
TGCCAACCCG 
AGTGATCGAA 
GTCATCTACC 
GAGAAGAATC 
GCCCAGCGCG 
GGCGGGACCA 
CAGGCCGATC 
TGCCGGCACC 
AGTCATGCCC 
TCGACGCTCT 
GTTGAGCACC 
GGCCACGGGG 
AGCCCGATCT 
GCCGGTGATG 
TTAATACGAC 



2400 

2460 

2520 

2580 

2640 

2700 

2760 

2820 

2880 

2940 

3000 

3060 

3120 

3180 

3240 

3300 

3360 

3420 

3480 

3540 

3600 

3660 

3720 

3780 

3840 

3900 

3960 

4020 
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TCACTATAGG GAGACCACAA CGGTTTCCCT CTAGAAATAA TTTTGTTTAA CTTTAAGAAG 4080 

GAGATATACC ATGGTACCAG ACACCGGAAA CCCCTGCCAC ACCACTAAGT TGTTGCACAG 4140 

AGACTCAGTG GACAGTGCTC CAATCCTCAC TGCATTTAAC AGCTCACACA AAGGACGGAT 4200 

TAACTGTAAT AGTAACACTA CACCCATAGT ACATTTAAAA GGTGATGCTA ATACTTTAAA 4260 

ATGTTTAAGA TATAGATTTA AAAAGCATTG TACATTGTAT ACTGCAGTGT CGTCTACATG 4320 

GCATTGGACA GGACATAATG TAAAACATAA AAGTGCAATT GTTACACTTA CATATGATAG 4380 

TGAATGGCAA CGTGACCAAT TTTTGTCTCA AGTTAAAATA CCAAAAACTA TTACAGTGTC 4440 

TACTGGATTT ATGTCTATAT GAGGATCCGG CTGCTAACAA AGCCCGAAAG GAAGCTGAGT 4500 

TGGCTGCTGC CACCGCTGAG CAATAACTAG CATAACCCCT TGGGGCCTCT AAACGGGTCT . 4560 

TGAGGGGTTT TTTGCTGAAA GGAGGAACTA TATCCGGATA TCCACAGGAC GGGTGTGGTC 4620 

GCCATGATCG CGTAGTCGAT AGTGGCTCCA AGTAGCGAAG CGAGCAGGAC TGGGCGGCGG 4680 

CCAAAGCGGT CGGACAGTGC TCCGAGAACG GGTGCGCATA GAAATTGCAT CAACGCATAT 4740 

AGCGCTAGCA GCACGCCATA GTGACTGGCG ATGCTGTCGG AATGGACGAT ATCCCGCAAG 4800 

AGGCCCGGCA GTACCGGCAT AACCAAGCCT ATGCCTACAG CATCCAGGGT GACGGTGCCG 4860 

AGGATGACGA TGAGCGCATT GTTAGATTTC ATACACGGTG CCTGACTGCG TTAGCAATTT 4920 

AACTGTGATA AACTACCGCA TTAAAGCTTA TCGATGATAA GCTGTCAAAC ATGAGAA 4977 

(2) INFORMATION FOR SEQ ID NO: IS: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 27 base pairs 
(Bj TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

CTCCCATGGT ACCAGACACC GGAAACC 2 7 

(2) INFORMATION FOR SEQ ID NO: 16: 

ii) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 27 base pairs 
(8) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 
GGGGGATCCT CATATAGACA TAAATCC 
(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4977 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: circular 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 

TTCTTGAAGA CGAAAGGGCC TCGTGATACG CCTATTTTTA TAGGTTAATG TCATGATAAT .60 

AATGGTTTCT TAGACGTCAG GTGGCACTTT TCGGGGAAAT GTGCGCGGAA CCCCTATTTG 120 

TTTATTTTTC TAAATACATT CAAATATGTA TCCGCTCATG AGACAATAAC CCTGATAAAT 180 

GCTTCAATAA TATTGAAAAA GGAAGAGTAT GAGTATTCAA CATTTCCGTG TCGCCCTTAT 240 

TCCCTTTTTT GCGGCATTTT GCCTTCCTGT TTTTGCTCAC CCAGAAACGC TGGTGAAAGT 300 

AAAAGATGCT GAAGATCAGT TGGGTGCACG AGTGGGTTAC ATCGAACTGG ATCTCAACAG 3 60 

CGGTAAGATC CTTGAGAGTT TTCGCCCCGA AGAACGTTTT CCAATGATGA GCACTTTTAA 420 

AGTTCTGCTA TGTGGCGCGG TATTATCCCG TGTTGACGCC GGGCAAGAGC AACTCGGTCG 480 

CCGCATACAC TATTCTCAGA ATGACTTGGT TGAGTACTCA CCAGTCACAG AAAAGCATCT 540 

TACGGATGGC ATGACAGTAA GAGAATTATG CAGTGCTGCC ATAACCATGA GTGATAACAC 600 

TGCGGCCAAC TTACTTCTGA CAACGATCGG AGGACCGAAG GAGCTAACCG CTTTTTTGCA 660 

CAACATGGGG GATCATGTAA CTCGCCTTGA TCGTTGGGAA CCGGAGCTGA ATGAAGCCAT 720 

ACCAAACGAC GAGCGTGACA CCACGATGCC TGCAGCAATG GCAACAACGT TGCGCAAACT 780 

ATTAACTGGC GAACTACTTA CTCTAGCTTC CCGGCAACAA TTAATAGACT GGATGGAGGC 840 

GGATAAAGTT GCAGGACCAC TTCTGCGCTC GGCCCTTCCG GCTGGCTGGT TTATTGCTGA 900 

TAAATCTGGA GCCGGTGAGC GTGGGTCTCG CGGTATCATT GCAGCACTGG GGCCAGATGG 960 

TAAGCCCTCC CGTATCGTAG TTATCTACAC GACGGGGAGT CAGGCAACTA TGGATGAACG 1020 
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AAATAGACAG ATCGCTGAGA TAGGTGCCTC ACTGATTAAG CATTGGTAAC TGTCAGACCA 1080 

AGTTTACTCA TATATACTTT AGATTGATTT AAAACTTCAT TTTTAATTTA AAAGGATCTA 1140 

GGTGAAGATC CTTTTTGATA ATCTCATGAC CAAAATCCCT TAACGTGAGT TTTCGTTCCA 1200 

CTGAGCGTCA GACCCCGTAG AAAAGATCAA AGGATCTTCT TGAGATCCTT TTTTTCTGCG 1260 

CGTAATCTGC TGCTTGCAAA CAAAAAAACC ACCGCTACCA GCGGTGGTTT GTTTGCCGGA 1320 

TCAAGAGCTA CCAACTCTTT TTCCGAAGGT AACTGGCTTC AGCAGAGCGC AGATACCAAA 1380 

TACTGTCCTT CTAGTGTAGC CGTAGTTAGG CCACCACTTC AAGAACTCTG TAGCACCGCC 1440 

TACATACCTC GCTCTGCTAA TCCTGTTACC AGTGGCTGCT GCCAGTGGCG ATAAGTCGTG 1500 

TCTTACCGGG TTGGACTCAA GACGATAGTT ACCGGATAAG GCGCAGCGGT CGGGCTGAAC 1560 

GGGGGGTTCG TGCACACAGC CCAGCTTGGA GCGAACGACC TACACCGAAC TGAGATACCT 1620 

ACAGCGTGAG CATTGAGAAA GCGCCACGCT TCCCGT^GGG AGAAAGGCGG ACAGGTATCC 1680 

GGTAAGCGGC AGGGTCGGAA CAGGAGAGCG CACGAGGGAG CTTCCAGGGG GAAACGCCTG 1740 

GTATCTTTAT AGTCCTGTCG GGTTTCGCCA CCTCTGACTT GAGCGTCGAT TTTTGTGATG 1800 

CTCGTCAGGG GGGCGGAGCC TATGGAAAAA CGCCAGCAAC GCGGCCTTTT TACGGTTCCT 1860 

GGCCTTTTGC TGGCCTTTTG CTCACATGTT CTTTCCTGCG TTATCCCCTG ATTCTGTGGA 1920 

TAACCGTATT ACCGCCTTTG AGTGAGCTGA TACCGCTCGC CGCAGCCGAA CGACCGAGCG 1980 

CAGCGAGTCA GTGAGCGAGG AAGCGGAAGA GCGCCTGATG CGGTATTTTC TCCTTACGCA 2040 

TCTGTGCGGT ATTTCACACC GCATATATGG TGCACTCTCA GTACAATCTG CTCTGATGCC 2100 

GCATAGTTAA GCCAGTATAC ACTCCGCTAT CGCTACGTGA CTGGGTCATG GCTGCGCCCC 2160 

GACACCCGCC AACACCCGCT GACGCGCCCT GACGGGCTTG TCTGCTCCCG GCATCCGCTT 2220 

ACAGACAAGC TGTGACCGTC TCCGGGAGCT GCATGTGTCA GAGGTTTTCA CCGTCATCAC 2280 

CGAAACGCGC GAGGCAGCTG CGGTAAAGCT CATCAGCGTG GTCGTGAAGC GATTCACAGA 2340 

TGTCTGCCTG TTCATCCGCG TCCAGCTCGT TGAGTTTCTC CAGAAGCGTT AATGTCTGGC 2400 

TTCTGATAAA GCGGGCCATG TTAAGGGCGG TTTTTTCCTG TTTGGTCACT TGATGCCTCC 2460 

GTGTAAGGGG GAATTTCTGT TCATGGGGGT AATGATACCG ATGAAACGAG AGAGGATGCT 2520 

CACGATACGG GTTACTGATG ATGAACATGC CCGGTTACTG GAACGTTGTG AGGGTAAACA 2580 

ACTGGCGGTA TGGATGCGGC GGGACCAGAG AAAAATCACT CAGGGTCAAT GCCAGCGCTT 2640 

CGTTAATACA GATGTAGGTG TTCCACAGGG TAGCCAGCAG CATCCTGCGA TGCAGATCCG 2700 
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GAACATAATG 


GTGCAGGGCG 


CTGACTTCCG 


CGTTTCCAGA 


CTTTACGAAA 


CACGGAAACC 


2760 


GAAGACCATT 


CATGTTGTTG 


CTCAGGTCGC 


AGACGTTTTG 


CAGCAGCAGT 


CGCTTCACGT 


2820 


TCGCTCGCGT 


ATCGGTGATT 


CATTCTGCTA 


ACCAGTAAGG 


CAACCCCGCC 


AGCCTAGCCG 


2880 


GGTCCTCAAC 


GACAGGAGCA 


CGATCATGCG 


CACCCGTGGC 


CAGGACCCAA 


CGCTGCCCGA 


2940 


GATGCGCCGC 


GTGCGGCTGC 


TGGAGATGGC 


GGACGCGATG 


GATATGTTCT 


GCCAAGGGTT 


3000 


GGTTTGCGCA 


TTCACAGTTC 


TCCGCAAGAA 


TTGATTGGCT 


CCAATTCTTG 


GAGTGGTGAA 


3060 


TCCGTTAGCG 


AGGTGCCGCC 


GGCTTCCATT 


CAGGTCGAGG 


TGGCCCGGCT 


CCATGCACCG 


3120 


CG ACG C AACG 


CGGGGAGGCA 


GACAAGGTAT 


AGGGCGGCGC 


CTACAATCCA 


TGCCAACCCG 


3180 


TTCCATGTG C 


TCGCCGAGGC 


GGCATAAATC 


GCCGTGACGA 


TCAGCGGTCC 


AGTGATCGAA 


3240 


GTTAGGCTGG 


TAAGAGCCGC 


GAGCGATCCT 


TGAAGCTGTC 


CCTGATGGTC 


GTCATCTACC 


3300 


TGCCTGGACA 


GCATGGCCTG 


CAACGCGGGC 


ATCCCGATGC 


CGCCGGAAGC 


GAGAAGAATC 


3360 


ATAATGGGGA 


AGGCCATCCA 


GCCTCGCGTC 


GCGAACGCCA 


GCAAGACGTA 


GCCCAGCGCG 


3420 


TCGGCCGCCA 


TGCCGGCGAT 


AATGGCCTGC 


TTCTCGCCGA 


AACGTTTGGT 


GGCGGGACCA 


3480 


GTGACGAAGG 


CTTGAGCGAG 


GGCGTGCAAG 


ATTCCGAATA 


CCGCAAGCGA 


CAGGCCGATC 


3540 


ATCGTCGCGC 


TCCAGCGAAA 


GCGGTCCTCG 


CCGAAAATGA 


CCCAGAGCGC 


TGCCGGCACC 


3600 


TGTCCTACGA 


GTTGCATGAT 


AAAGAAGACA 


GTCATAAGTG 


CGGCGACGAT 


AGTCATGCCC 


3660 


CGCGCCCACC 


GGAAGGAGCT 


GACTGGGTTG 


AAGGCTCTCA 


AGGGCATCGG 


TCGACGCTCT 


3720 


CCCTTATGCG 


ACTCCTGCAT 


TAGGAAGCAG 


CCCAGTAGTA 


GGTTGAGGCC 


GTTGAGCACC 


3780 


GCCGCCGCAA 


GGAATGGTGC 


ATGCAAGGAG 


ATGGCGCCCA 


ACAGTCCCCC 


GGCCACGGGG 


3840 


CCTGCCACCA 


TACCCACGCC 


GAAACAAGCG 


CTCATGAGCC 


CGAAGTGGCG 


AGCCCGATCT 


3900 


TCCCCATCGG 


TGATGTCGGC 


GATATAGGCG 


CCAGCAACCG 


CACCTGTGGC 


GCCGGTGATG 


3960 


CCGGCCACGA 


TGCGTCCGGC 


GTAGAGGATC 


GAGATCTCGA 


TCCCGCGAAA 


TTAATACGAC 


4020 


TCACTATAGG 


GAGACCACAA 


CGGTTTCCCT 


CTAGAAATAA 


TTTTGTTTAA 


CTTTAAGAAG 


4080 


GAGATATACC 


ATGGTACCAG 


ACACCGGAAA 


CCCCTCCCAC 


ACCACTAAGT 


TGTTGCACAG 


4140 


AGACTCAGTG 


GACAGTGCTC 


CAATCCTCAC 


TGCATTTAAC 


AGCTCACACA 


AAGGACGGAT 


4200 


TAACTGTAAT 


AGTAACACTA 


CACCCATAGT 


ACATTTAAAA 


GGTGATGCTA 


ATACTTTAAG 


4260 


ATCTTTAAGA 


TATAGATTTA 


AAAAGCATTC 


TACATTGTAT 


ACTGCAGTGT 


CGTCTACATG 


4320 


GCATTGGACA 


GGACATAATG 


TAAAACATAA 


AAGTGCAATT 


GTTACACTTA 


CATATGATAG 


4380 
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TGAATGGCAA CGTGACCAAT TTTTGTCTCA AGTTAAAATA CCAAAAACTA TTACAGTGTC 4440 

TACTGGATTT ATGTCTATAT GAGGATCCGG CTGCTAACAA AGCCCGAAAG GAAGCTGAGT 4 500 

TGGCTGCTGC CACCGCTGAG CAATAACTAG CATAACCCCT TGGGGCCTCT AAACGGGTCT 4560 

TGAGGGGTTT TTTGCTGAAA GGAGGAACTA TATCCGGATA TCCACAGGAC GGGTGTGGTC 4620 

GCCATGATCG CGTAGTCGAT AGTGGCTCCA AGTAGCGAAG CGAGCAGGAC TGGGCGGCGG 4680 

CCAAAGCGGT CGGACAGTGC TCCGAGAACG GGTGCGCATA GAAATTGCAT CAACGCATAT 4740 

AGCGCTAGCA GCACGCCATA GTGACTGGCG ATGCTGTCGG AATGGACGAT ATCCCGCAAG 4800 

AGGCCCGGCA GTACCGGCAT AACCAAGCCT ATGCCTACAG CATCCAGGGT GACGGTGCCG 4860 

AGGATGACGA TGAGCGCATT GTTAGATTTC ATACACGGTG CCTGACTGCG TTAGCAATTT. 4920 

AACTGTGATA AACTACCGCA TTAAAGCTTA TCGATGATAA GCTGTCAAAC ATGAGAA 49 77 
(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 59 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:I8: 
CGACACTGCA GTATACAATG TAGAATGCTT TTTAAATCTA TATCTTAAAG ATCTTAAAG 59 
(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 
GCGTCGGCCG CCATGCCGGC GATAAT 2 6 

(2) INFORMATION FOR SEQ ID NO: 20: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 4819 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: circular 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:20: 

TTCTTGAAGA CGAAAGGGCC TCGTGATACG CCTATTTTTA TAGGTTAATG TCATGATAAT 60 

AATGGTTTCT TAGACGTCAG GTGGCACTTT TCGGGGAAAT GTGCGCGGAA CCCCTATTTG 120 

TTTATTTTTC TAAATACATT CAAATATGTA TCCGCTCATG AGACAATAAC CCTGATAAAT 180 

) GCTTCAATAA TATTGAAAAA GGAAGAGTAT GAGTATTCAA CATTTCCGTG TCGCCCTTAT 240 

TCCCTTTTTT GCGGCATTTT GCCTTCCTGT TTTTGCTCAC CCAGAAACGC TGGTGAAAGT 300 

AAAAGATGCT GAAGATCAGT TGGGTGCACG AGTGGGTTAC ATCGAACTGG ATCTCAACAG 360 

CGGTAAGATC CTTGAGAGTT TTCGCCCCGA AGAACGTTTT CCAATGATGA GCACTTTTAA 420 

AGTTCTGCTA TGTGGCGCGG TATTATCCCG TGTTGACGCC GGGCAAGAGC AACTCGGTCG 480 

CCGCATACAC TATTCTCAGA ATGACTTGGT TGAGTACTCA CCAGTCACAG AAAAGCATCT 540 

TACGGATGGC ATGACAGTAA GAGAATTATG CAGTGCTGCC ATAACCATGA GTGATAACAC 600 

TGCGGCCAAC TTACTTCTGA CAACGATCGG AGGACCGAAG GAGCTAACCG CTTTTTTGCA 660 

CAACATGGGG GATCATGTAA CTCGCCTTGA TCGTTGGGAA CCGGAGCTGA ATGAAGCCAT 720 

. ^ ACCAAACGAC GAGCGTGACA CCACGATGCC TGCAGCAATG GCAACAACGT TGCGCAAACT 780 

ATTAACTGGC GAACTACTTA CTCTAGCTTC CCGGCAACAA TTAATAGACT GGATGGAGGC 840 

GGATAAAGTT GCAGGACCAC TTCTGCGCTC GGCCCTTCCG GCTGGCTGGT TTATTGCTGA 900 

TAAATCTGGA GCCGGTGAGC GTGGGTCTCG CGGTATCATT GCAGCACTGG GGCCAGATGG 960 

TAAGCCCTCC CGTATCGTAG TTATCTACAC GACGGGGAGT CAGGCAACTA TGGATGAACG 1020 

AAATAGACAG ATCGCTGAGA TAGGTGCCTC ACTGATTAAG CATTGGTAAC TGTCAGACCA 1080 

AGTTTACTCA TATATACTTT AGATTGATTT AAAACTTCAT TTTTAATTTA AAAGGATCTA 1140 

GGTGAAGATC CTTTTTGATA ATCTCATGAC CAAAATCCCT TAACGTGAGT TTTCGTTCCA 1200 

CTGAGCGTCA GACCCCGTAG AAAAGATCAA AGGATCTTCT TGAGATCCTT TTTTTCTGCG 12 60 

CGTAATCTGC TGCTTGCAAA CAAAAAAACC ACCGCTACCA GCGGTGGTTT GTTTGCCGGA 1320 

TCAAGAGCTA CCAACTCTTT TTCCGAAGGT AACTGGCTTC AGCAGAGCGC AGATACCAAA 1380 
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TACTGTCCTT 


CTAGTGTAGC 


CGTAGTTAGG 


CCACCACTTC 


AAGAACTCTG 


TAGCACCGCC 


1440 


TACATACCTC 


GCTCTGCTAA 


TCCTGTTACC 


AGTGGCTGCT 


GCCAGTGGCG 


ATAAGTCGTG 


1500 


TCTTACCGGG 


TTGGACTCAA 


GACGATAGTT 


ACCGGATAAG 


GCGCAGCGGT 


CGGGCTGAAC 


1560 


GGGGGGTTCG 


TGCACACAGC 


CCAGCTTGGA 


GCGAACGACC 


TACACCGAAC 


TGAGATACCT 


1620 


ACAGCGTGAG 


CATTGAGAAA 


GCGCCACGCT 


TCCCGAAGGG 


AGAAAGGCGG 


ACAGGTATCC 


1680 


GGTAAGCGGC 


AGGGTCGGAA 


CAGGAGAGCG 


CACGAGGGAG 


CTTCCAGGGG 


GAAACGCCTG 


1740 


GTATCTTTAT 


AGTCCTGTCG 


GGTTTCGCCA 


CCTCTGACTT 


GAGCGTCGAT 


TTTTGTGATG 


1800 


CTCGTCAGGG 


GGGCGGAGCC 


TATGGT^AAAA 


CGCCAGCAAC 


GCGGCCTTTT 


TACGGTTCCT 


1860 


GGCCTTTTGC 


TGGCCTTTTG 


CTCACATGTT 


CTTTCCTGCG 


TTATCCCCTG 


ATTCTGTGGA 


1920 


TAACCGTATT 


ACCGCCTTTG 


AGTGAGCTGA 


TACCGCTCGC 


CGCAGCCGAA 


CGACCGAGCG 


1980 


CAGCGAGTCA 


GTGAGCGAGG 


AAGCGGi\AGA 


GCGCCTGATG 


CGGTATTTTC 


TCCTTACGCA 


2040 


TCTGTGCGGT 


ATTTCACACC 


GCATATATGG 


TGCACTCTCA 


GTACAATCTG 


CTCTGATGCC 


2100 


GCATAGTTAA 


GCCAGTATAC 


ACTCCGCTAT 


CGCTACGTGA 


CTGGGTCATG 


GCTGCGCCCC 


2160 


GACACCCGCC 


AACACCCGCT 


GACGCGCCCT 


GACGGG CTTG 


TCTGCTCCCG 


GCATCCGCTT 


2220 


ACAGACAAGC 


TGTGACCGTC 


TCCGGGAGCT 


GCATGTGTCA 


GAGGTTTTCA 


CCGTCATCAC 


2280 


CGAAACGCGC 


GAGGCAGCTG 


CGGTAAAGCT 


CATCAGCGTG 


GTCGTGAAGC 


GATTCACAGA 


2340 


TGTCTGCCTG 


TTCATCCGCG 


TCCAGCTCGT 


TGAGTTTCTC 


CAGi^GCGTT 


AATGTCTGGC 


2400 


TTCTGATAAA 


GCGGGCCATG 


TTAAGGGCGG 


TTTTTTCCTG 


TTTGGTCACT 


TGATGCCTCC 


2460 


GTGTAAGGGG 


GAATTTCTGT 


TCATGGGGGT 


AATGATACCG 


ATGAAACGAG 


AGAGGATGCT 


2520 


CACGATACGG 


GTTACTGATG 


ATGAACATGC 


CCGGTTACTG 


GAACGTTGTG 


AGGGTAAACA 


2580 


ACTGGCGGTA 


TGGATGCGGC 


GGGACCAGAG 


AAAAATCACT 


CAGGGTCAAT 


GCCAGCGCTT 


2640 


CGTTAATACA 


GATGTAGGTG 


TTCCACAGGG 


TAGCCAGCAG 


CATCCTGCGA 


TGCAGATCCG 


2700 


GAACATAATG 


GTGCAGGGCG 


CTGACTTCCG 


CGTTTCCAGA 


CTTTACGAAA 


CACGGAAACC 


2760 


GAAGACCATT 


CATGTTGTTG 


CTCAGGTCGC 


AGACGTTTTG 


CAGCAGCAGT 


CGCTTCACGT 


2820 


TCGCTCGCGT 


ATCGGTGATT 


CATTCTGCTA 


ACCAGTAAGG 


CAACCCCGCC 


AGCCTAGCCG 


2880 


GGTCCTCAAC 


GACAGGAGCA 


CGATCATG CG 


CACCCGTGGC 


CAGGACCCAA 


CGCTGCCCGA 


2940 


GATGCGCCGC 


GTGCGGCTGC 


TGGAGATGGC 


GGACGCGATG 


GATATGTTCT 


GCCAAGGGTT 


3000 


GGTTTGCGCA 


TTCACAGTTC 


TCCGCAAGAA 


TTGATTGGCT 


CCAATTCTTG 


GAGTGGTGAA 


3060 
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TCCGTTAGCG AGGTGCCGCC GGCTTCCATT 
CGACGCAACG CGGGGAGGCA GACAAGGTAT 
TTCCATGTGC TCGCCGAGGC GGCATAAATC 
GTTAGGCTGG TAAGAGCCGC GAGCGATCCT 
TGCCTGGACA GCATGGCCTG CAACGCGGGC 
ATAATGGGGA AGGCCATCCA GCCTCGCGTC 
TCGGCCGCCA TGCCGGCGAT AATGGCCTGC 
GTGACGAAGG CTTGAGCGAG GGCGTGCAAG 
ATCGTCGCGC TCCAGCGAAA GCGGTCCTCG 
TGTCCTACGA GTTGCATGAT AAAGAAGACA 
CGCGCCCACC GGAAGGAGCT GACTGGGTTG 
CCCTTATGCG ACTCCTGCAT TAGGAAGCAG 
GCCGCCGCAA GGAATGGTGC ATGCAAGGAG 
CCTGCCACCA TACCCACGCC GAAACAAGCG 
TCCCCATCGG TGATGTCGGC GATATAGGCG 
CCGGCCACGA TGCGTCCGGC GTAGAGGATC 
TCACTATAGG GAGACCACAA CGGTTTCCCT 
GAGATATACA TATGGAACCG GTCGACCCGC 
AGCCGAAAAC CGCGTTCATC ACCAAAGCCC 
GTCAGCGACG TCGTCCGCCG CAGGGATCCC 
GATCAGCATT GGCTAGCATG ACTGGTGGAC 
AAAGCCCGAA AGGAAGCTGA GTTGGCTGCT 
CTTGGGGCCT CTAAACGGGT CTTGAGGGGT 
TATCCACAGG ACGGGTGTGG TCGCCATGAT 
AGCGAGCAGG ACTGGGCGGC GGCCAAAGCG 
TAGAAATTGC ATCAACGCAT ATAGCGCTAG 
GGAATGGACG ATATCCCGCA AGAGGCCCGG 
AGCATCCAGG GTGACGGTGC CGAGGATGAC 
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CAGGTCGAGG TGGCCCGGCT CCATGCACCG 3120 

AGGGCGGCGC CTACAATCCA TGCCAACCCG 3180 

GCCGTGACGA TCAGCGGTCC AGTGATCGAA 3240 

TGAAGCTGTC CCTGATGGTC GTCATCTACC 3300 

ATCCCGATGC CGCCGGAAGC GAGAAGAATC 3360 

GCGAACGCCA GCAAGACGTA GCCCAGCGCG 3420 

TTCTCGCCGA AACGTTTGGT GGCGGGACCA 3480 

ATTCCGAATA CCGCAAGCGA CAGGCCGATC 3540 

CCGAAAATGA CCCAGAGCGC TGCGGGCACC 3600 

GTCATAAGTG CGGCGACGAT AGTCATGCCC 3660 

AAGGCTCTCA AGGGCATCGG TCGACGCTCT 3720 
CCCAGTAGTA GGTTGAGGCC GTTGAGCACC . v 3780 

ATGGCGCCCA ACAGTCCCCC GGCCACGGGG 3840 

CTCATGAGCC CGAAGTGGCG AGCCCGATCT 3900 

CCAGCAACCG CACCTGTGGC GCCGGTGATG 3960 

GAGATCTCGA TCCCGCGAAA TTAATACGAC 4020 

CTAGAAATAA TTTTGTTTAA CTTTAAGAAG 4080 

GTCTGGAACC ATGGAAACAC CCCGGGTCCC 4140 

TAGGTATCTC TTACGGCCGT AAAAAACGTC 4200 

AGACCCACCA GGTTTCTCTG TCTAAACAGT 4260 

AGCAAATGGG TCGCGGATCC GGCTGCTAAC 4320 

GCCACCGCTG AGCAATAACT AGCATAACCC 43 80 

TTTTTGCTGA AAGGAGGAAC TATATCCGGA 4440 

CGCGTAGTCG ATAGTGGCTC CAAGTAGCGA 4 500 

GTCGGACAGT GCTCCGAGAA CGGGTGCGCA 4 5 60 

CAGCACGCCA TAGTGACTGG CGATGCTGTC 4620 

CAGTACCGGC ATAACCAAGC CTATGCCTAC 4680 

GATGAGCGCA TTGTTAGATT TCATACACGG 47 40 
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TGCCTGACTG CGTTAGCAAT TTAACTGTGA TAAACTACCG CATTAAAGCT TATCGATGAT 
AAGCTGTCAA ACATGAGAA 
(2) INFORMATION FOR SEQ ID NO:21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 
TTTACGGCCG • TAAGAGATAC CTAGGGCTTT GGTGATGAAC GCGGT 
(2) INFORMATION FOR SEQ ID NO:22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5574 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: circular 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 

TTCTTGAAGA CGAAAGGGCC TCGTGATACG CCTATTTTTA TAGGTTAATG TCATGATAAT 

AATGGTTTCT TAGACGTCAG GTGGCACTTT TCGGGGAAAT GTGCGCGGAA CCCCTATTTG 

TTTATTTTTC TAAATACATT CAAATATGTA TCCGCTCATG AGACAATAAC CCTGATAAAT 

GCTTCAATAA TATTGAAAAA GGAAGAGTAT GAGTATTCAA CATTTCCGTG TCGCCCTTAT 

TCCCTTTTTT GCGGCATTTT GCCTTCCTGT TTTTGCTCAC CCAGAAACGC TGGTGAAAGT 

AAAAGATGCT GAAGATCAGT TGGGTGCACG AGTGGGTTAC ATCGAACTGG ATCTCAACAG 

CGGTAAGATC CTTGAGAGTT TTCGCCCCGA AGAACGTTTT CCAATGATGA GCACTTTTAA 

AGTTCTGCTA TGTGGCGCGG TATTATCCCG TGTTGACGCC GGGCAAGAGC AACTCGGTCG 

CCGCATACAC TATTCTCAGA ATGACTTGGT TGAGTACTCA CCAGTCACAG AAAAGCATCT 

TACGGATGGC ATGACAGTAA GAGAATTATG CAGTGCTGCC ATAACCATGA GTGATAACAC 

TGCGGCCAAC TTACTTCTGA CAACGATCGG AGGACCGAAG GAGCTAACCG CTTTTTTGCA 
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CAACATGGGG 


GATCATGTAA 


CTCGCCTTGA 


TCGTTGGGAA 


CCGGAGCTGA 


ATGAAGCCAT 


720 


ACCAAACGAC 


GAGCGTGACA 


CCACGATGCC 


TGCAGCAATG 


GCAACAACGT 


TGCGCAAACT 


780 


ATTAACTGGC 


GAACTACTTA 


CTCTAGCTTC 


CCGGCAACAA 


TTAATAGACT 


GGATGGAGGC 


840 


GGATAAAGTT 


GCAGGACCAC 


TTCTGCGCTC 


GGCCCTTCCG 


GCTGGCTGGT 


TTATTGCTGA 


900 


TAAATCTGGA 


GCCGGTGAGC 


GTGGGTCTCG 


CGGTATCATT 


GCAGCACTGG 


GGCCAGATGG 


960 


TAAGCCCTCC 


CGTATCGTAG 


TTATCTACAC 


GACGGGGAGT 


CAGGCAACTA 


TGGATGAACG 


1020 


AAATAGACAG 


ATCGCTGAGA 


TAGGTGCCTC 


ACTGATTAAG 


CATTGGTAAC 


TGTCAGACCA 


1080 


AGTTTACTCA 


TATATACTTT 


AGATTGATTT 


AAAACTTCAT 


TTTTAATTTA 


AAAGGATCTA 


1140 


GGTGAAGATC 


CTTTTTGATA 


ATCTCATGAC 


CAAAATCCCT 


TAACGTGAGT 


TTTCGTTCCA 


1200 


CTGAGCGTCA 


GACCCCGTAG 


AAAAGATCAA 


AGGATCTTCT 


TGAGATCCTT 


TTTTTCTGCG 


1260 


CGTAATCTGC 


TGCTTGCAAA 


CAAAAAAACC 


ACCGCTACCA 


GCGGTGGTTT 


GTTTGCCGGA 


1320 


TCAAGAGCTA 


CCAACTCTTT 


TTCCGAAGGT 


AACTGGCTTC 


AGCAGAGCGC 


AGATACCAAA 


1380 


TACTGTCCTT 


CTAGTGTAGC 


CGTAGTTAGG 


CCACCACTTC 


AAGAACTCTG 


TAGCACCGCC 


1440 


TACATACCTC 


GCTCTGCTAA 


TCCTGTTACC 


AGTGGCTGCT 


GCCAGTGGCG 


ATAAGTCGTG 


1500 


TCTTACCGGG 


TTGGACTCAA 


GACGATAGTT 


ACCGGATAAG 


GCGCAGCGGT 


CGGGCTGAAC 


1560 


GGGGGGTTCG 


TGCACACAGC 


CCAGCTTGGA 


GCGAACGACC 


TACACCGAAC 


TGAGATACCT 


1620 


ACAGCGTGAG 


CATTGAGAAA 


GCGCCACGCT 


TCCCGAAGGG 


AGAAAGGCGG 


ACAGGTATCC 


1680 


GGTAAGCGGC 


AGGGTCGGAA 


CAGGAGAGCG 


CACGAGGGAG 


CTTCCAGGGG 


GAAACGCCTG 


1740 


GTATCTTTAT 


AGTCCTGTCG 


GGTTTCGCCA 


CCTCTGACTT 


GAGCGTCGAT 


TTTTGTGATG 


1800 


CTCGTCAGGG 


GGGCGGAGCC 


TATGGAAAAA 


CGCCAGCAAC 


GCGGCCTTTT 


TACGGTTCCT 


1860 


GGCCTTTTGC 


TGGCCTTTTG 


CTCACATGTT 


CTTTCCTGCG 


TTATCCCCTG 


ATTCTGTGGA 


1920 


TAACCGTATT 


ACCGCCTTTG 


AGTGAGCTGA 


TACCGCTCGC 


CGCAGCCGAA 


CGACCGAGCG 


1980 


CAGCGAGTCA 


GTGAGCGAGG 


AAGCGGAAGA 


GCGCCTGATG 


CGGTATTTTC 


TCCTTACGCA 


2040 


TCTGTGCGGT 


ATTTCACACC 


GCATATATGG 


TGCACTCTCA 


GTACAATCTG 


CTCTGATGCC 


2100 


GCATAGTTAA 


GCCAGTATAC 


ACTCCGCTAT 


CGCTACGTGA 


CTGGGTCATG 


GCTGCGCCCC 


2160 


GACACCCGCC 


AACACCCGCT 


GACGCGCCCT 


GACGGGCTTG 


TCTGCTCCCG 


GCATCCGCTT 


2220 


ACAGACAAGC 


TGTGACCGTC 


TCCGGGAGCT 


GCATGTGTCA 


GAGGTTTTCA 


CCGTCATCAC 


2280 


CGAAACGCGC 


GAGGCAGCTG 


CGGTAAAGCT 


CATCAGCGTG 


GTCGTGAAGC 


GATTCACAGA 


2340 
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TGTCTGCCTG TTCATCCGCG TCCAGCTCGT TGAGTTTCTC CAGAAGCGTT AATGTCTGGC 2400 

TTCTGATAAA GCGGGCCATG TTAAGGGCGG TTTTTTCCTG TTTGGTCACT TGATGCCTCC 2460 

GTGTAAGGGG GAATTTCTGT TCATGGGGGT AATGATACCG ATGAAACGAG AGAGGATGCT 2520 

CACGATACGG GTTACTGATG ATGAACATGC CCGGTTACTG GAACGTTGTG AGGGTAAACA 2580 

ACTGGCGGTA TGGATGCGGC GGGACCAGAG AAAAATCACT CAGGGTCAAT GCCAGCGCTT 2640 

CGTTAATACA GATGTAGGTG TTCCACAGGG TAGCCAGCAG CATCCTGCGA TGCAGATCCG 2700 

GAACATAATG GTGCAGGGCG CTGACTTCCG CGTTTCCAGA CTTTACGAAA CACGGAAACC 2760 

GAAGACCATT CATGTTGTTG CTCAGGTCGC AGACGTTTTG CAGCAGCAGT CGCTTCACGT 2820 

TCGCTCGCGT ATCGGTGATT CATTCTGCTA ACCAGTAAGG CAACCCCGCC AGCCTAGCCG . 2880 

GGTCCTCAAC GACAGGAGCA CGATCATGCG CACCCGTGGC CAGGACCCAA CGCTGCCCGA 2940 

GATGCGCCGC GTGCGGCTGC TGGAGATGGC GGACGCGATG GATATGTTCT GCCAAGGGTT 3000 

GGTTTGCGCA TTCACAGTTC TCCGCAAGAA TTGATTGGCT CCAATTCTTG GAGTGGTGAA 3060 

TCCGTTAGCG AGGTGCCGCC GGCTTCCATT CAGGTCGAGG TGGCCCGGCT CCATGCACCG 3120 

CGACGCAACG CGGGGAGGCA GACAAGGTAT AGGGCGGCGC CTACAATCCA TGCCAACCCG 3180 

TTCCATGTGC TCGCCGAGGC GGCATAAATC GCCGTGACGA TCAGCGGTCC AGTGATCGAA 3240 

GTTAGGCTGG TAAGAGCCGC GAGCGATCCT TGAAGCTGTC CCTGATGGTC GTCATCTACC 3300 

TGCCTGGACA GCATGGCCTG CAACGCGGGC ATCCCGATGC CGCCGGAAGC GAGAAGAATC 3360 

ATT^TGGGGA AGGCCATCCA GCCTCGCGTC GCGAACGCCA GCAAGACGTA GCCCAGCGCG 3420 

TCGGCCGCCA TGCCGGCGAT AATGGCCTGC TTCTCGCCGA AACGTTTGGT GGCGGGACCA 3480 

GTGACGAAGG CTTGAGCGAG GGCGTGCAAG ATTCCGAATA CCGCAAGCGA CAGGCCGATC 3540 

ATCGTCGCGC TCCAGCGAAA GCGGTCCTCG CCGAAAATGA CCCAGAGCGC TGCCGGCACC 3600 

TGTCCTACGA GTTGCATGAT AAAGAAGACA GTCATAAGTG CGGCGACGAT AGTCATGCCC 3660 

CGCGCCCACC GGAAGGAGCT GACTGGGTTG AAGGCTCTCA AGGGCATCGG TCGACGCTCT 3720 

CCCTTATGCG ACTCCTGCAT TAGGAAGCAG CCCAGTAGTA GGTTGAGGCC GTTGAGCACC 3780 

GCCGCCGCAA GGAATGGTGC ATGCAAGGAG ATGGCGCCCA ACAGTCCCCC GGCCACGGGG 3840 

CCTGCCACCA TACCCACGCC GAAACAAGCG CTCATGAGCC CGAAGTGGCG AGCCCGATCT 3900 

TCCCCATCGG TGATGTCGGC GATATAGGCG CCAGCAACCG CACCTGTGGC GCCGGTGATG 3960 

CCGGCCACGA TGCGTCCGGC GTAGAGGATC GAGATCTCGA TCCCGCGAAA TTAATACGAC 4020 
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TCACTATAGG 


GAGACCACAA 


CGGTTTCCCT 


CTAGAAATAA 


TTTTGTTTAA 


CTTTAAGAAG 


4080 


GAGATATACA 


TATGGAACCG 


GTCGACCCGC 


GTCTGGAACC 


ATGGAAACAC 


CCCGGGTCCC 


4140 


AGCCGAAAAC 


CGCGTTCATC 


ACCAAAGCCC 


TAGGTATCTC 


TTACGGCCGT 


AAAAAACGTC 


4200 


GTCAGCGACG 


TCGTCCGCCG 


CAGGGATCTT 


CCATGGCCGG 


TGCTGGACGC 


ATTTACTATT 


4260 


CTCGCTTTGG 


TGACGAGGCA 


GCCAGATTTA 


GTACAACAGG 


GCATTACTCT 


GTAAGAGATC 


4320 


AGGACAGAGT 


GTATGCTGGT 


GTCTCATCCA 


CCTCTTCTGA 


TTTTAGAGAT 


CGCCCAGACG 


4380 


GAGTCTGGGT 


CGCATCCGAA 


GGACCTGAAG 


GAGACCCTGC 


AGGAAAAGAA 


GCCGAGCCAG 


4440 


CCCAGCCTGT 


CTCTTCTTTG 


CTCGGCTCCC 


CCGCCTGCGG 


TCCCATCAGA 


GCAGGCCTCG 


4500 


GTTGGGTACG 


GGACGGTCCT 


CGCTCGCACC 


CCTACAATTT 


TCCTGCAGGC 


TCGGGGGGCT 


4560 


CTATTCTCCG 


CTCTTCCTCC 


ACCCCGGTGC 


AGGGCACGGT 


ACCGGTGGAC 


TTGGCATCAA 


4620 


GGCAGGAAGA 


AGAGGAGCAG 


TCGCCCGACT 


CCACAGAGGA 


AGAACCAGTG 


ACTCTCCCAA 


4680 


GGCGCACCAC 


CAATGATGGA 


TTCCACCTGT 


TAAAGGCAGG 


AGGGTCATGC 


TTTGCTCTAA 


4740 


TTTCAGGAAC 


TGCTAACCAG 


GTAAAGTGCT 


ATCGCTTTCG 


GGTGAAAAAG 


AACCATAGAC 


4800 


ATCGCTACGA 


GAACTGCACC 


ACCACCTGGT 


TCACAGTTGC 


TGACAACGGT 


GCTGAAAGAC 


4860 


AAGGACAAGC 


ACAAATACTG 


ATCACCTTTG 


GATCGCCAAG 


TCAAAGGCAA 


GACTTTCTGA 


4920 


AACATGTACC 


ACTACCTCCT 


GGAATGAACA 


TTTCCGGCTT 


TACAGCCAGC 


TTGGACTTCT 


4980 


GATCACTGCC 


ATTGCCTTTT 


CTTCATCTGA 


CTGGTGTACT 


ATGCCAAATC 


TATGGTTTCT 


5040 


ATTGTTCTTG 


GGACTAGGAA 


GATCCGGCTG 


CTAACAAAGC 


CCGAAAGGAA 


GCTGAGTTGG 


■ 5100 


CTGCTGCCAC 


CGCTGAGCAA 


TAACTAGCAT 


AACCCCTTGG 


GGCCTCTAAA 


CGGGTCTTGA 


5160 


GGGGTTTTTT 


GCTGAAAGGA 


GGAACTATAT 


CCGGATATCC 


ACAGGACGGG 


TGTGGTCGCC 


5220 


ATGATCGCGT 


AGTCGATAGT 


GGCTCCAAGT 


AGCGAAGCGA 


GCAGGACTGG 


GCGGCGGCCA 


5280 


AAGCGGTCGG 


ACAGTGCTCC 


GAGAACGGGT 


GCGCATAGAA 


ATTGCATCAA 


CGCATATAGC 


5340 


GCTAGCAGCA 


CGCCATAGTG 


ACTGGCGATG 


CTGTCGGAAT 


GGACGATATC 


CCGCAAGAGG 


5400 


CCCGGCAGTA 


CCGGCATAAC 


CAAGCCTATG 


CCTACAGCAT 


CCAGGGTGAC 


GGTGCCGAGG 


5460 


ATGACGATGA 


GCGCATTGTT 


AGATTTCATA 


CACGGTGCCT 


GACTGCGTTA 


GCAATTTAAC 


5520 


TGTGATAAAC 


TACCGCATTA 


AAGCTTATCG 


ATGATAAGCT 


GTCAAACATG 


AGAA 


5574 


(2) INFORMATION FOR SEQ ID NO: 23: 











(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 
GATCCCAGAC CCACCAGGTT 
(2) INFORMATION FOR SEQ ID NO:24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 
GAACCTGGTG GGTCTGG 

(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:25: 
CGTCCGCCGC AGGGATCGCA GACCCACCAG GTTTCTCTGT CTAAACAGGC 
(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 58 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 

CATGGCCTGT TTAGACAGAG AAACCTGGTG GGTCTGCGAT CCCTGCGGCG GACGACGT 58 

(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 
<A) LENGTH: 48 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

:) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 
CATGTACGGC CGTAAAAAAC GTCGTCAGCG ACGTCGTCCG CCGGACAC 48 
(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 
CGGTGTCCGG CGGACGACGT CGCTGACGAC GTTTTTTACG GCCGTA 46 
(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID KO:29: 
ATCATCGATA AGCTTTAATG CGGTAG 
(2) INFORMATION FOR SEQ ID NO: 30: 



26 
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) (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 52 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 
ACTTTAAGAA GGAGATATAC ATATGTTCAT CACCAAAGCC CTAGGTATCT CT 52 
(2) INFORMATION FOR SEQ ID NO: 31: 

(i) . SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 51 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY : linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 
ACTTTAAGAA GGAGATATAC ATATGTACGG CCGTAAAAAA CGTCGTCAGC G 51 
(2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 52 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D ) TOPOLOGY : linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 
AACGTCGTCA GCGACGTCGT CCGCCGGACA CCGGAAACCC CTGCCACACC AC 52 
(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
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(xi) SEQUENCE DESCRIPTION: SEQ 10 NO; 33: 
CGAAAAGTGC CACCTGACGT CTAAGAAACC 



(2) INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 6 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 
CTCCCATGGC TAGCAACACT ACACCC 26 
<2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10 base pairs 

(B) TYPE: nucleic acid 

{ C ) STRANDEDNESS : s ingle 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 
GAAGATCTTC 10 
(2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 
CAGAGGAAGC CATGGTGACT CTCCCAA 
(2) INFORMATION FOR SEQ ID NO: 37: 



27 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 
AAGGCAATGG ATCCGATCAG AAGTCCA 
(2) INFORMATION FOR SEQ ID NO: 38: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 134 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38: 

Met Tyr Gly Arg Lys Lys Arg Arg Gin Arg Arg Arg Pro Pro Asp Thr 
15 10 15 



Gly Asn Pro Cys His Thr Thr Lys Leu Leu His Arg Asp Ser Val Asp 
20 25 30 



Ser Ala Pro lie Leu Thr Ala Phe Asn Ser Ser His Lys Gly Arg lie 
■ 35 40 45 



Asn Cys Asn Ser Asn Thr Thr Pro lie Val His Leu Lys Gly Asp Ala 
SO 55 60 



Asn Thr Leu Lys Cys Leu Arg Tyr Arg Phe Lys Lys His Cys Thr Leu 
65 70 75 80 



Tyr Thr Ala Val Ser Ser Thr Trp His Trp Thr Gly His Asn Val Lys 
85 90 95 



His Lys Ser Ala lie Val Thr Leu Thr Tyr Asp Ser Glu Trp Gin Arg 
100 105 110 
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Asp Gin Phe Leu Ser Gin Val Lys He Pro Lys Thr He Thr Val Ser 
115 120 125 



Thr Gly Phe Met Ser He 

130 ----- 

(2) INFORMATION FOR SEQ ID NO: 39: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 55 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: 
CATGTACGGC CGTAAAAAAC GTCGTCAGCG ACGTCGTCCG CTGAGTCAGG CCCAG 
(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 51 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40: 
CTGGGCCTGA CTCAGCGGAC GACGTCGCTG ACGACGTTTT TTACGGCCGT A 
(2) INFORMATION FOR SEQ ID NO: 41: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41: 
TCCTTCCTGT CCGCTGGTCA GCGCCCGCGC CGCCTGTCCA CCTAAG 
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(2) INFORMATION FOR SEQ ID NO: 42: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 54 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:42: 
AATTCTTAGG TGGACAGGCG GCGCGGGCGC TGACCAGCGG ACAGGAAGGA CATG 54 
(2) INFORMATION FOR SEQ ID NO:43: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 43: 
GGGGACTTTC CGCTGGGGAC TTTCCACGGG GGACTTTCC 39 
(2) INFORMATION FOR SEQ ID NO: 44: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 44: 
GGAAAGTCCC CCGTGGAAAG TCCCCAGCGG AAAGTCCCC 39 
(2) INFORMATION FOR SEQ ID NO: 45: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 45: 
GTCTACTTTC CGCTGTCTAC TTTCCACGGT CTACTTTCC 39 
(2) INFORMATION FOR SEQ ID NO: 46: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 46: 
GGAAAGTAGA CCGTGGAAAG TAGACAGCGG AAAGTAGAC 39 
(2) INFORMATION FOR SEQ ID NO: 47: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 12 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:47: 

Tyr Gly Arg Lys Lys Arg Arg Gin Arg Arg Arg Pro 
15 10 

(2) INFORMATION FOR SEQ ID NO: 48: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 46: 
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Tyr Gly Arg Lys Lys Arg Arg Gin Arg Arg Arg Pro Pro Gin Gly Ser 
1 5 10 15 



Gin Thr His Gin Val Ser Leu Ser Lys Gin 
20 • 25 

(2) INFORMATION FOR SEQ ID NO: 49: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 35 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:49: 

Phe lie Thr Lys Ala Leu Gly lie Ser Tyr Gly Arg Lys Lys Arg Arg 
15 10 15 



Gin Arg Arg Arg Pro Pro Gin Gly Ser Gin Thr His Gin Val Ser Leu 
20 25 30 



Ser Lys Gin 
35 

(2) INFORMATION FOR SEQ ID NO: 50: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 50: 

Phe lie Thr Lys Ala Leu Gly lie Ser Tyr Gly Arg Lys Lys Arg Arg 
15 10 15 



Gin Arg Arg Arg Pro 
20 



(2) INFORMATION FOR SEQ ID NO: 51: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 121 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 51: 



Pro Asp Thr Gly Asn Pro Cys His Thr Thr Lys Leu Leu His Arg Asp 
^ 5 10 15 



Ser Val Asp Ser Ala Pro He Leu Thr Ala Phe Asn Ser Ser His Lys 
20 25 30 



Gly Arg He Asn Cys Asn Ser Asn Thr Thr Pro He Val His Leu Lys 
35 40 45 



Gly Asp Ala Asn Thr Leu Lys Cys Leu Arg Tyr Arg Phe Lys Lys His 
50 55 60 



Cys Thr Leu Tyr Thr Ala Val Ser Ser Thr Trp His Trp Thr Gly His 
65 70 75 80 



Asn Val Lys His Lys Ser Ala He Val Thr Leu Thr Tyr Asp Ser Glu 
85 90 95 



Trp Gin Arg Asp Gin Phe Leu Ser Gin Val Lys He Pro Lys Thr He 
100 105 110 



Thr Val Ser Thr Gly Phe Met Ser He 
115 120 

(2) INFORMATION FOR SEQ ID NO: 52: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 52: 
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Gly Arg Lys Lys Arg Arg Gin Arg Arg Arg Pro Pro Gin Gly Ser 
1 5 10 15 

(2) INFORMATION FOR SEQ ID NO: 53: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 53; 



Phe He Thr Lys Ala Leu Gly He Ser Tyr Gly Arg Lys Lys Arg Arg 
1 5 10 15 



Gin Arg Arg Arg Pro Pro Gin Gly Ser 
20 25 

(2) INFORMATION FOR SEQ ID NO: 54: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 85 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 54: 

Cys Asn Ser Asn Thr Thr Pro He Val His Leu Lys Gly Asp Ala Asn 
15 10 15 



Thr Leu Lys Cys Leu Arg Tyr Arg Phe Lys Lys His Cys Thr Leu Tyr 
20 25 30 



Thr Ala Val Ser Ser Thr Trp His Trp Thr Gly His Asn Val Lys His 
35 40 45 



Lys Ser Ala He Val Thr Leu Thr Tyr Asp Ser Glu Trp Gin Arg Asp 
50 55 60 
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Gln Phe Leu Ser Gin Val Lys lie Pro Lys Thr He Thr Val Ser Thr 
^5 70 75 80 



Gly Phe Met Ser He 
85 

(2) INFORMATION FOR SEQ ID NO: 55: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 121 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 55: 

Pro Asp Thr Gly Asn Pro Cys His Thr Thr Lys Leu Leu His Arg Asp 
15 10 15 



Ser Val Asp Ser Ala Pro He Leu Thr Ala Phe Asn Ser Ser His Lys 
20 25 30 



Gly Arg He Asn Cys Asn Ser Asn Thr Thr Pro He Val His Leu Lys 
35 40 45 



Gly Asp Ala Asn Thr Leu Lys Ser Leu Arg Tyr Arg Phe Lys Lys His 
50 55 60 



Ser Thr Leu Tyr Thr Ala Val Ser Ser Thr Trp His Trp Thr Gly His 
65 70 75 80 



Asn Val Lys His Lys Ser Ala He Val Thr Leu Thr Tyr Asp Ser Glu 
85 90 95 



Trp Gin Arg Asp Gin Phe Leu Ser Gin Val Lys He Pro Lys Thr He 
100 105 110 



Thr Val Ser Thr Gly Phe Met Ser lie 
115 120 

(2) INFORMATION FOR SEQ ID NO: 56: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 161 amino acids 
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(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 56: 

Leu Gly Trp Val Arg Asp Gly Pro Arg Ser His Pro Tyr Asn Phe Pro 
IS 10 15 



Ala Gly Ser Gly Gly Ser lie Leu Arg Ser Ser Ser Thr Pro Val Gin 
20 25 30 



Gly Thr Val Pro Val Asp Leu Ala Ser Arg Gin Glu Glu Glu Glu Gin 
35 40 45 



Ser Pro Asp Ser Thr Glu Glu Glu Pro Val Thr Leu Pro Arg Arg Thr 
50 55 60 



Thr Asn Asp Gly Phe His Leu Leu Lys Ala Gly Gly Ser Cys Phe Ala 
65 70 75 80 



Leu lie Ser Gly Thr Ala Asn Gin Val Lys Cys Tyr Arg Phe Arg Val 
85 90 95 



Lys Lys Asn His Arg His Arg Tyr Glu Asn Cys Thr Thr Thr Trp Phe 
100 105 110 



Thr Val Ala Asp Asn Gly Ala Glu Arg Gin Gly Gin Ala Gin lie Leu 
115 120 125 



lie Thr Phe Gly Ser Pro Ser Gin Arg Gin Asp Phe Leu Lys His Val 
130 135 140 



Pro Leu Pro Pro Gly Met Asn lie Ser Gly Phe Thr Ala Ser Leu Asp 
145 150 155 160 



Phe 



(2) INFORMATION FOR SEQ ID NO: 57 : 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 249 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:57: 



Met Ala Gly Ala Gly Arg He Tyr Tyr Ser Arg Phe Gly Asp Glu Ala 
15 10 15 

Ala Arg Phe Ser Thr Thr Gly His Tyr Ser Val Arg Asp Gin Asp Arg 
20 25 30 



Val Tyr Ala Gly Val Ser Ser Thr Ser Ser Asp Phe' Arg Asp Arg Pro 
35 40 45 



Asp Gly Val Trp Val Ala Ser Glu Gly Pro Glu Gly Asp Pro Ala Gly 
50 55 60 



Lys Glu Ala Glu Pro Ala Gin Pro Val Ser Ser Leu Leu Gly Ser Pro 
^5 70 75 80 



Ala Cys Gly Pro He Arg Ala Gly Leu Gly Trp Val Arg Asp Gly Pro 
85 90 95 



Arg Ser His Pro Tyr Asn Phe Pro Ala Gly Ser Gly Gly Ser lie Leu 
100 105 110 



Arg Ser Ser Ser Thr Pro Val Gin Gly Thr Val Pro Val Asp Leu Ala 
115 120 125 



Ser Arg Gin Glu Glu Glu Glu Gin Ser Pro Asp Ser Thr Glu Glu Glu 
130 135 140 



Pro Val Thr Leu Pro Arg Arg Thr Thr Asn Asp Gly Phe His Leu Leu 
145 150 155 160 



Lys Ala Gly Gly Ser Cys Phe Ala Leu He Ser Gly Thr Ala Asn Gin 
165 170 175 
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Val Lys Cys Tyr Arg Phe Arg Val Lys Lys Asn His Arg His Arg Tyr 
180 185 190 



Glu Asn Cys Thr Thr Thr Trp Phe Thr Val Ala Asp Asn Gly Ala Glu 
195 200 205 



Arg Gin Gly Gin Ala Gin lie Leu lie Thr Phe Gly Ser Pro Ser Gin 
210 215 220 



Arg Gin Asp Phe Leu Lys His Val Pro Leu Pro Pro Gly Met Asn lie 
225 230 235 240 



Ser Gly Phe Thr Ala Ser Leu Asp Phe 
245 

(2) INFORMATION FOR SEQ ID NO: 58: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 385 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 58: 

Met Tyr Gly Arg Lys Lys Arg Arg Gin Arg Arg Arg Pro Leu Ser Gin 
15 10 15 



Ala Gin Leu Met Pro Ser Pro Pro Met Pro Val Pro Pro Ala Ala Leu 
20 25 30 



Phe Asn Arg Leu Leu Asp Asp Leu Gly Phe Ser Ala Gly Pro Ala Leu 
35 40 45 



Cys Thr Met Leu Asp Thr Trp Asn Glu Asp Leu Phe Ser Gly Phe Pro 
50 55 60 



Thr Asn Ala Asp Met Tyr Arg Glu Cys Lys Phe Leu Ser Thr Leu Pro 
65 70 75 80 



Ser Asp Val lie Asp Trp Giy Asp Ala His Val Pro Glu Arg Ser Pro 
85 90 95 
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Ile Asp lie Arg Ala His Gly Asp Val Ala Phe Pro Thr Leu Pro Ala 
100 105 110 

Thr Arg Asp Glu Leu Pro Ser Tyr Tyr Glu Ala Met Ala Gin Phe Phe 

115 120 1-25 

Arg Gly Glu Leu Arg Ala Arg Glu Glu Ser Tyr Arg Thr Val Leu Ala 
130 135 140 

Asn Phe Cys Ser Ala Leu Tyr Arg Tyr Leu Arg Ala Ser Val Arg Gin 
145 150 155 160 

Leu His Arg Gin Ala His Met Arg Gly Arg Asn Arg Asp Leu Arg Glu 
165 170 175 

Met Leu Arg Thr Thr He Ala Asp Arg Tyr Tyr Arg Glu Thr Ala Arg 
180 185 190 

Leu Ala Arg Val Leu Phe Leu His Leu Tyr Leu Phe Leu Ser Arg Glu 
195 200 205_ 

He Leu Trp Ala Ala Tyr Ala Glu Gin Met Met Arg Pro Asp Leu Phe 
210 215 220 

Asp Gly Leu Cys Cys Asp Leu Glu Ser Trp Arg Gin Leu Ala Cys Leu 
225 230 235 240 

Phe Gin Pro Leu Met Phe He Asn Gly Ser Leu Thr Val Arg Gly Val 
245 250 255 

Pro Val Glu Ala Arg Arg Leu Arg Glu Leu Asn His He Arg Glu His 
260 265 270 

Leu Asn Leu Pro Leu Val Arg Ser Ala Ala Ala Glu Glu Pro Gly Ala 
275 280 285 

Pro Leu Thr Thr Pro Pro Val Leu Gin Gly Asn Gin Ala Arg Ser Ser 
290 295 300 



Gly Tyr Phe Met Leu Leu He Arg Ala Lys Leu Asp Ser Tyr Ser Ser 
305 310 315 320 
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Val Ala Thr Ser Glu Gly Glu Ser Val Met Arg Glu His Ala Tyr Ser 

325 330 335 

Arg Gly Arg Thr Arg Asn Asn Tyr Gly Ser Thr lie Glu Gly X^eu Leu 

340 345 350 

Asp Leu Pro Asp Asp Asp Asp Ala Pro Ala Glu Ala Gly Leu Val Ala 

355 360 365 

Pro Arg Met Ser Phe Leu Ser Ala Gly Gin Arg Pro Arg Arg Leu Ser 

370 375 380 



SEQUENCE DESCRIPTION: SEQ ID NO: 59: 

Tyr Gly Arg Lys Lys Arg Arg Gin Arg Arg Arg Pro Pro Gin Gly 
5 10 15 



Thr 
385 

(2) INFORMATION FOR SEQ ID NO: 59: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 148 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) 

Met 
1 



Ser Gin Thr His Gin Val Ser Leu 
20 



Ser Lys Gin Pro Asp Thr Gly Asn 
25 30 



Pro Cys His Thr Thr Lys Leu Leu His Arg Asp Ser Val Asp Ser Ala 
35 40 45 



Pro lie Leu Thr Ala Phe Asn Ser Ser His Lys Gly Arg lie Asn Cys 
50 55 60 



Asn ser Asn Thr Thr Pro lie Val His Leu Lys Gly Asp Ala Asn Thr 
65 70 75 80 
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Leu Lye Cys Leu Arg Tyr Arg Phe Lys Lys His Cys Thr Leu Tyr Thr 
85 90 95 

Ala Val Ser Ser Thr Trp His Trp Thr Gly His Asn Val Lys His Lys 
lOP 105 110 

Ser Ala He Val Thr Leu Thr Tyr Asp Ser Glu Trp Gin Arg Asp Gin 
115 120 125 

Phe Leu Ser Gin Val Lys He Pro Lys Thr He Thr Val Ser Thr Gly 
130 135 



Phe Met Ser He 
145 



(2) INFORMATION FOR SEQ ID NO: 60: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 157 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: protein 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 60: 

Met Phe He Thr Lys Ala Leu Gly He Ser Tyr Gly Arg Lys Lys Arg 
^5 10 15 

Arg Gin Arg Arg Arg Pro Pro Gin Gly Ser Gin Thr His Gin Val Ser 
20 25 30 

Leu Ser Lys Gin Pro Asp Thr Gly Asn Pro Cys His Thr Thr Lys Leu 
35 40 45 

Leu His Arg Asp Ser Val Asp Ser Ala Pro He Leu Thr Ala Phe Asn 
50 55 60 

Ser Ser His Lys Gly Arg He Asn Cys Asn Ser Asn Thr Thr Pro He 
65 70 75 80 



Val His Leu Lys Gly Asp Ala Asn Thr Leu Lys Cys Leu Arg Tyr Arg 
85 90 95 
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Phe Lys Lye His Cys Thr Leu Tyr Thr Ala Val Ser Ser Thr Trp His 
100 105 110 



Trp Thr Gly His Asn Val Lys His Lys Ser Ala lie Val Thr Leu Thr 
115 120 125 



Tyr Asp Ser Glu Trp Gin Arg Asp Gin Phe Leu Ser Gin Val Lys lie 
130 135 140 



Pro Lys Thr lie Thr Val Ser Thr Gly Phe Met Ser lie 
145 150 155 

(2) INFORMATION FOR SEQ ID NO:61: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 177 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 61: 

Met Gly Arg Lys Lys Arg Arg Gin Arg Arg Arg Pro Pro Gin Gly Ser 
1 5 .10 15 



Leu Gly Trp Val Arg Asp Gly Pro Arg Ser His Pro Tyr Asn Phe Pro 
20 25 30 



Ala Gly Ser Gly Gly Ser lie Leu Arg Ser Ser Ser Thr Pro Val Gin 
35 40 45 



Gly Thr Val Pro Val Asp Leu Ala Ser Arg Gin Glu Glu Glu Glu Gin 
50 55 60 



Ser Pro Asp Ser Thr Glu Glu Glu Pro Val Thr Leu Pro Arg Arg Thr 
65 70 75 80 



Thr Asn Asp Gly Phe His Leu Leu Lys Ala Gly Gly Ser Cys Phe Ala 
85 90 95 
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Leu lie Ser Gly Thr Ala Asn Gin Val Lys Cys Tyr Arg Phe Arg Val 
100 105 110 



Lys Lys Asn His Arg His Arg Tyr Glu Asn Cys Thr Thr Thr Trp Phe 
115 120 125 



Thr Val Ala Asp Asn Gly Ala Glu Arg Gin Gly Gin Ala Gin lie Leu 
130 135 140 



He Thr Phe Gly Ser Pro Ser Gin Tirg Gin Asp Phe Leu Lys Hie Val 
145 150 155 160 



Pro Leu Pro Pro Gly Met Asn He Ser Gly Phe Thr Ala Ser Leu Asp 
165 170 175 



Phe 



{2) INFORMATION FOR SEQ ID NO: 62: 

(i) SEQtJENCE CHARACTERISTICS: 

(A) LENGTH: 187 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 62: 

Met Phe lie Thr Lys Ala Leu Gly lie Ser Tyr Gly Arg Lys Lys Arg 
15 10 15 



Arg Gin Arg Arg Arg Pro Pro Gin Gly Ser Leu Gly Trp Val Arg Asp 
20 25 30 



Gly Pro Arg Ser His Pro Tyr Asn Phe Pro Ala Gly Ser Gly Gly Ser 
35 40 45 



lie Leu Arg Ser Ser Ser Thr Pro Val Gin Gly Thr Val Pro Val Asp 
50 55 60 



Leu Ala Ser Arg Gin Glu Glu Glu Glu Gin Ser Pro Asp Ser Thr Glu 
65 70 75 80 
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Glu Glu Pro Val Thr Leu Pro Arg Arg Thr Thr Asn Asp Gly Phe Hie 
85 90 95 

Leu Leu Lys Ala Gly Gly Ser Cys Phe Ala Leu lie Ser Gly Thr Ala 
100 105 110 

Asn Gin Val Lys Cys Tyr Arg Phe Arg Val Lys Lys Asn His Arg His 
115 120 12S 

Arg Tyr Glu Asn Cys Thr Thr Thr Trp Phe Thr Val Ala Asp Asn Gly 
130 135 140 



Ala Glu Arg Gin Gly Gin Ala Gin lie Leu lie Thr Phe Gly Ser Pro 
145 150 155 160 



Ser Gin Arg Gin Asp Phe Leu Lys His Val Pro Leu Pro Pro Gly Het 
165 170 175 



Asn lie Ser Gly Phe Thr Ala Ser Leu Asp Phe 
180 185 

(2) INFORMATION FOR SEQ ID NO: 63 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 143 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 63: 

Met Phe lie Thr Lys Ala Leu Gly lie Ser Tyr Gly Arg Lys Lys Arg 
1 5 10 15 



Arg Gin Arg Arg Arg Pro Pro Asp Thr Gly Asn Pro Cys His Thr Thr 
20 25 30 



Lys Leu Leu His Arg Asp Ser Val Asp Ser Ala Pro lie Leu Thr Ala 
35 40 45 
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Phe Asn Ser Ser His Lys Gly Arg lie Asn Cys Asn Ser Asn Thr Thr 
50 55 60 



Pro lie Val His Leu Lys Gly Asp Ala Asn Thr Leu Lys Cys Leu Arg 

70 75 80 

Tyr Arg Phe Lys Lys His Cys Thr Leu Tyr Thr Ala Val Ser Ser Thr 
85 90 95 

Trp His Trp Thr Gly His Asn Val Lys His Lys Ser Ala lie Val Thr 
100 105 110 

Leu Thr Tyr Asp Ser Glu Trp Gin Arg Asp Gin Phe Leu Ser Gin Val 
115 120 125 

Lys He Pro Lys Thr He Thr Val Ser Thr Gly Phe Met Ser He 
130 135 140 
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CLAIMS 

We claim: 

!• A fusion protein consisting of a carboxy- 
terminal cargo moiety and an amino-terminal transport 
moiety, wherein 

(a) the transport moiety is characterized by: 

(i) the presence of amino acids 49-57 of HIV 
tat protein; 

(ii) the absence of amino acids 22-36 of HIV 
tat protein; and 

(iii) the absence of amino acids 73-86 of HIV 
tat protein; and 

(b) the cargo moiety retains significant 
biological activity following transport moiety- 
dependent intracellular delivery. 

2, The fusion protein according to claim 1, 
wherein the cargo moiety is selected from the group 
consisting of therapeutic molecules, prophylactic 
molecules and diagnostic molecules. 

3, A fusion protein consisting of a carboxy- 
terminal cargo moiety and an amino-terminal transport 
moiety, wherein the cargo moiety consists of a human 
papillomavirus E2 repressor that retains its biological 
activity after delivery into a target cell and the 
transport moiety is selected from the group consisting 
of: 

(a) amino acids 47-58 of HIV tat protein 
(SEQ ID NO: 47) ; 

(b) amino acids 47-72 of HIV tat protein 
(SEQ ID NO: 48) ; 

(c) amino acids 38-72 of HIV tat protein 
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(SEQ ID NO: 49) ; and 

(d) amino acids 38-58 of HIV tat protein 
(SEQ ID N0:50) . 

4. The fusion protein according to claim 3, 
wherein the transport moiety is preceded by an amino- 
terminal methionine. 



5. The fusion protein according to any one 
of claims 1 to 4 , wherein the cargo moiety consists of 
amino acids 245-365 of the human papillomavirus E2 
protein (SEQ ID N0:51). 



6. Fusion protein 

7. Fusion protein 

8. Fusion protein 

9. Fusion protein 



JB106 (SEQ ID NO:38)-- 

JB117 (SEQ ID NO: 59) . 

JB118 (SEQ ID NO:60)- 

JB122 (SEQ ID NO: 63) . 



10. A fusion protein consisting of a 
carboxy-terminal cargo moiety and an amino-terminal 
transport moiety, wherein the cargo moiety consists of 
a bovine papillomavirus E2 repressor that retains its 
biological activity after delivery into a target cell 
and the transport moiety is selected from the group 
consisting of: 

(a) amino acids 47-62 of HIV tat protein 
(SEQ ID NO: 52) ; and 

(b) amino acids 38-62 of HIV tat protein 
(SEQ ID NO: 53) . 
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11. The fusion protein according to claim 
10, wherein the transport moiety is preceded by an 
amino-terminal methionine. 

12 . The fusion protein according to any one 
of claims 1, 2, 10 or 11, wherein the cargo moiety is 
an E2 repressor consisting of amino acids 250-410 of 
the bovine papillomavirus E2 protein (SEQ ID NO:56)» 

13. Fusion protein JB119 (SEQ ID NO: 61). 

14. Fusion protein JB120 (SEQ ID NO:62). 

15. A covalently linked chemical conjugate 
consisting of a transport polypeptide moiety and a 
cargo moiety, wherein: 

(a) the transport polypeptide moiety of the 
conjugate is characterized by: 

(i) the presence of amino acids 49-57 of HIV 
tat protein; 

(ii) the absence of amino acids 22-36 of HIV 
tat protein; and 

(iii) the absence of amino acids 73-86 of HIV 
tat protein; and 

(b) the cargo moiety of the conjugate retains 
significant biological activity following transport 
moiety-dependent intracellular delivery. 

16. The covalently linked chemical conjugate 
according to claim 15, wherein the transport 
polypeptide moiety consists of amino acids 37-72 of HIV 
tat protein (SEQ ID NO: 2). 
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17. The covalently linked chemical conjugate 
according to claim 16, wherein the cargo moiety is 
selected from the group consisting of: 

(a) amino acids 245-3 65 of human papillomavirus E2 
protein (SEQ ID NO:51); and 

(b) amino acids 24 5-3 65 of human papillomavirus E2 
protein, wherein amino acids 3 00 and 3 09 have been 
changed to cysteine (SEQ ID NO: 55). 

18. A covalently linked chemical conjugate 
consisting of a transport moiety and a cargo moiety, 
wherein the transport polypeptide consists of aminq 
acids 37-72 of HIV tat protein (SEQ ID N0:2), and the 
cargo moiety is selected from the group consisting of: 

(a) amino acids 245-3 65 of the human 
papillomavirus E2 protein (SEQ ID NO:51); and 

(b) amino acids 245-365 of the human 
papillomavirus E2 protein, wherein amino acids 300 and 
309 have been changed to cysteine (SEQ ID NO: 55). 

19. A fusion protein consisting of a 
carboxy-terminal cargo moiety and an amino-terminal 
transport moiety, wherein the cargo moiety consists of 
amino acids 43-412 of HSV VP16 protein and the 
transport moiety consists of amino acids 47-58 of HIV 
tat protein. 

20. The fusion protein according to claim 
19, wherein the transport moiety is preceded by an 
amino-terminal methionine. 

21. A covalently linked chemical conjugate 
consisting of a transport polypeptide moiety and a 
cargo moiety, wherein the transport polypeptide moiety 
consists of amino acids 37-72 of HIV tat protein (SEQ 
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ID NO: 2) and the cargo moiety is a double-stranded DNA 
selected from the group consisting of: 

(a) oligonucleotide NFl (SEQ ID NO: 43) annealed to 
oligonucleotide NF2 (SEQ ID NO: 44), and 

(b) oligonucleotide NF3 (SEQ ID NO: 45) annealed to 
oligonucleotide NF4 (SEQ ID NO: 46). 

22. The use of a fusion protein according to 
any one of claims 1 to 14 , 19 or 2 0 for the 
intracellular delivery of cargo. 

23. The use of a covalently linked chemical 
conjugate according to any one of claims 15 to 17 or 21 
for the intracellular delivery of cargo. 

24. A pharmaceutical composition comprising 
a pharmaceutically effective amount of a fusion protein 
according to any one of claims 1 to 14 . 

25. A pharmaceutical composition comprising 
a pharmaceutically effective amount of a fusion protein 
according to claim 19 or 20. 

26. A pharmaceutical composition comprising 
a pharmaceutically effective amount of a covalently 
linked chemical conjugate according to any one of 
claims 15 to 18, or 21. 

27. A DNA molecule comprising a nucleotide 
sequence encoding a fusion protein selected from the 
group consisting of: 

(a) JB106 (SEQ ID NO:38), 

(b) JB117 (SEQ ID NO: 59), 

(c) JB118 (SEQ ID NO: 60), 

(d) JB119 (SEQ ID NO:61), 
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(e) JB120 (SEQ ID NO:62), and 

(f) JB122 (SEQ ID NO:63). 

28, A DNA molecule comprising a nucleotide 
sequence encoding fusion protein tat-VPl6R.GF (SEQ ID 
NO:58) • 

29, The DNA molecule according to claim 27, 
wherein the nucleotide sequence encoding the fusion 
protein is operatively linked to expression control 
sequences . 

30, The DNA molecule according to claim 28, 
wherein the nucleotide sequence encoding the fusi-on 
protein is operatively linked to expression control 
sequences . 

31, A unicellular host transformed with a 
DNA molecule according to claim 29, 

32, A unicellular host transformed with a 
DNA molecule according to claim 30, 

33, A process for producing a fusion protein 
selected from the group consisting of: 

(a) JB106 (SEQ ID NO:38); 

(b) JB117 (SEQ ID NO:59); 

(c) JB118 (SEQ ID NO:60); 

(d) JB119 (SEQ ID NO:61); 

(e) JB120 (SEQ ID NO:62); and 

(f) JB122 (SEQ ID NO:63); 

said method comprising the steps of; 

(a) culturing a transformed unicellular host 
according to claim 31; and 
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(b) recovering the fusion protein from said 
culture, 

34. A process for producing a fusion protein 
consisting of amino acids 47-58 of HIV tat protein 
followed by amino acids 43-412 of HSV VP16 protein, 
said method comprising the steps of: 

(a) culturing a transformed unicellular host 
according to claim 32; and 

(b) recovering the fusion protein from said 
culture. 
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Met Glu Pro Val Asp Pro Arg Leu Glu Pro Trp Lys His Pro Gly 
15 10 15 



Ser Gin Pro Lys Thr Ala Cys Thr Asn Cys Tyr Cys Lys Lys Cys 

20 25 30 



Cys Phe His Cys Gin Val Cys Phe lie Thr Lys Ala Leu Gly lie 

35 40 45 



Ser Tyr Gly Arg Lys Lys Arg Arg Gin Arg Arg Arg Pro Pro Gin 

50 55 60 



Gly Ser Gin Thr His Gin Val Ser Leu Ser Lys Gin Pro Thr Ser 

65 70 75 



Gin Ser Arg Gly Asp Pro Thr Gly Pro Lys Glu 

80 85 
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FIG. 12 



MET TYR GLY ARG LYS LYS ARG ARG GLN ARG ARG 
47 

ARG PRO PRO ASP THR GLY ASN PRO CYS HIS THR THR 
58 245 

LYS LEU LEU HIS ARG ASP SER VAL ASP SER ALA PRO 

255 

ILE LEU THR ALA PHE ASN SER SER HIS LYS GLY ARG 
267 

ILE ASN CYS ASN SER ASN THR THR PRO ILE VAL HIS 

279 

LEU LYS GLY ASP ALA ASN THR LEU LYS CYS LEU ARG 
291 



TYR ARG PHE LYS LYS HIS CYS THR LEU TYR THR ALA 

303 

VAL SER SER THR TRP HIS TRP THR GLY HIS ASN VAL 
315 

LYS HIS LYS SER ALA ILE VAL THR LEU THR TYR ASP 
327 



SER GLU TRP GLN ARG ASP GLN PHE LEU SER GLN VAL 
339 



LYS ILE PRO LYS THR ILE THR VAL SER THR GLY PHE 

351 

365 

MET SER ILE 
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